M.Tech Thesis: Performance Evaluation of Simultaneous Perturbation Methods for Simulation Optimization and Policy Learning

The objective is to implement SP methods on RL problems.

For implementation of these problems, we are using two SP methods: SPSA and RDSA along with a neural network based function approximator. Further, we analyze these algorithms on common discrete and continuous control environments and compare performance with the popular REINFORCE algorithm.

The experimental studies show that SPSA i) is easy to implement ii) takes less time in training iii) requires two function measurements per iteration and iv) outperforms REINFORCE in walking robot task.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Learning_curve_acrobot.png		Learning_curve_acrobot.png
Learning_curve_biped_walker.png		Learning_curve_biped_walker.png
Learning_curve_cartpole.png		Learning_curve_cartpole.png
Learning_curve_mountaincar.png		Learning_curve_mountaincar.png
README.md		README.md
RL problems.png		RL problems.png
Thesis_Report.pdf		Thesis_Report.pdf
rdsa_acrobot.py		rdsa_acrobot.py
rdsa_biped.py		rdsa_biped.py
rdsa_cartpole.py		rdsa_cartpole.py
rdsa_mountaincar.py		rdsa_mountaincar.py
spsa_acrobot.py		spsa_acrobot.py
spsa_biped.py		spsa_biped.py
spsa_cartpole.py		spsa_cartpole.py
spsa_mountaincar.py		spsa_mountaincar.py
tensorflow_version.py		tensorflow_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M.Tech Thesis: Performance Evaluation of Simultaneous Perturbation Methods for Simulation Optimization and Policy Learning

About

Releases

Packages

Languages

monika58/Mtech-Thesis-Project

Folders and files

Latest commit

History

Repository files navigation

M.Tech Thesis: Performance Evaluation of Simultaneous Perturbation Methods for Simulation Optimization and Policy Learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages