Off-Policy Deep Reinforcement Learning without Exploration

Code corresponding to the paper. If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 0.4 and Python 2.7.

Overview

Main algorithm, Batch-Constrained Q-learning (BCQ), can be found at BCQ.py.

If you are interested in reproducing some of the results from the paper, an expert policy (DDPG) needs to be trained by running train_expert.py. This will save the expert model. A new buffer can then be collected by running generate_buffer.py and adjusting the settings in the code or using the default settings.

If you are interested in the standard forward RL tasks with DDPG or TD3, check out my other Github.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
BCQ.py		BCQ.py
DDPG.py		DDPG.py
README.md		README.md
generate_buffer.py		generate_buffer.py
main.py		main.py
train_expert.py		train_expert.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

About

Releases

Packages

Languages

FragLegs/BCQ

Folders and files

Latest commit

History

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages