Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning
About
We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU. Finally, we publicly release DrQ-v2's implementation to provide RL practitioners with a strong and computationally efficient baseline.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | DMC-GB video hard | Cartpole Swingup Score130 | 18 | |
| Autonomous Driving | NoCrash Town02 | Return1.02e+3 | 15 | |
| Finger Spin | DMControl Novel view (test) | Reward793.7 | 12 | |
| Cup Catch | DMControl Novel view (test) | Reward919.8 | 12 | |
| Continuous Control | DMC-GB video easy | Cartpole Swingup Score267 | 12 | |
| Autonomous Driving | CARLA Leaderboard | Return1.51e+3 | 9 | |
| Autonomous Driving | NoCrash (Town01) | Return1.66e+3 | 8 | |
| LiftPegUpright | ManiSkill3 Medium Ground Texture | Success Rate42 | 7 | |
| LiftPegUpright | ManiSkill3 Easy Lighting Direction v1 (test) | Success Rate46 | 7 | |
| LiftPegUpright | ManiSkill3 Easy Ground Texture Test v1 (test) | Success Rate46 | 7 |