Skip to content
/ BCQ Public
forked from sfujim/BCQ

PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"

Notifications You must be signed in to change notification settings

FragLegs/BCQ

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Code corresponding to the paper. If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 0.4 and Python 2.7.

Overview

Main algorithm, Batch-Constrained Q-learning (BCQ), can be found at BCQ.py.

If you are interested in reproducing some of the results from the paper, an expert policy (DDPG) needs to be trained by running train_expert.py. This will save the expert model. A new buffer can then be collected by running generate_buffer.py and adjusting the settings in the code or using the default settings.

If you are interested in the standard forward RL tasks with DDPG or TD3, check out my other Github.

About

PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%