Continuous control with deep reinforcement learning
About
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | LunarLanderContinuous v2 | Mean Reward337.2 | 65 | |
| Reinforcement Learning | MountainCarContinuous v0 | Average Agent Reward93.62 | 65 | |
| Continuous Control | MuJoCo Ant v4 | Average Return-237.8 | 46 | |
| Reinforcement Learning | Walker2D v5 | Average Return200.3 | 45 | |
| Reinforcement Learning Control | Pendulum v1 | Mean Score942.2 | 40 | |
| Continuous Control | MuJoCo Walker2d v4 | Normalized Performance17.4466 | 39 | |
| Continuous Control | MuJoCo HalfCheetah v4 | Average Return1.56e+4 | 36 | |
| Reinforcement Learning | Pendulum | Avg Episode Reward-155.6 | 26 | |
| Reinforcement Learning | BipedalWalker | Average Episode Reward209.4 | 26 | |
| Reinforcement Learning | MountainCar | Avg Episode Reward0.9536 | 18 |