PPO for OpenAI Gym Cartpole

Link

WandB: https://wandb.ai/arth-shukla/PPO%20Gym%20Cart%20Pole

Papers Used

Proximal Policy Optimization Algorithms: https://arxiv.org/pdf/1707.06347.pdf

Technologies Used

Algorithms/Concepts: PPO, Experience Replay

AI Development: Pytorch (Torch, Cuda), OpenAI Gym, WandB

Evaluation and Inference

More episode videos available on WandB: https://wandb.ai/arth-shukla/PPO%20Gym%20Cart%20Pole

The PPO Model currently only supports discrete action spaces (categorical distribution). In OpenAI Gym Cartpole, by episode 136, the agent is able to effectively "beat" cartpole:

Episode 136 Video

Future Experiments

First I want to implement algorithms that came before PPO (DQNs or earlier actor-critic algorithms like DDPG, etc) to get a stronger understanding of the math. Also, I'll get a change to make agents for popular environemnts like Mario.

I also want to tackle more challenging game environments, like the DM Control Suite. To do this, I'll explore PPO for continuous action spaces (through normal distributions), other similarly effective models like SAN, and models like RecurrentPPO which offer some implemenation challenges.

Finally, there are some other options for experience replay I'd like to implement, like Prioritized ER.

About Me

Arth Shukla Site | GitHub | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
videos		videos
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dm_cartpole.py		dm_cartpole.py
gym_cartpole.py		gym_cartpole.py
ppo.py		ppo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO for OpenAI Gym Cartpole

Link

Papers Used

Technologies Used

Evaluation and Inference

Episode 136 Video

Future Experiments

About Me

About

Releases

Packages

Languages

arth-shukla/ppo-gym-cartpole

Folders and files

Latest commit

History

Repository files navigation

PPO for OpenAI Gym Cartpole

Link

Papers Used

Technologies Used

Evaluation and Inference

Episode 136 Video

Future Experiments

About Me

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages