A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
About
Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| PickCube | ManiSkill3 Medium Camera FOV v1 (test) | Success Rate11 | 7 | |
| PickCube | ManiSkill Easy Camera Fov v3 (test) | Success Rate26 | 7 | |
| PlaceAppleInBowl | ManiSkill3 Hard Camera Pose v1 (test) | Success Rate3 | 7 | |
| PokeCube | ManiSkill Medium Table Color 3 (test) | Success Rate29 | 7 | |
| PokeCube | ManiSkill3 Easy Camera Pose v1 (test) | Success Rate22 | 7 | |
| PokeCube | ManiSkill3 Medium Camera FOV v1 (test) | Success Rate11 | 7 | |
| PokeCube | ManiSkill3 Hard Camera Pose v1 (test) | Success Rate6 | 7 | |
| PullCube | ManiSkill3 Easy Ro Texture | Success Rate70 | 7 | |
| PullCube | ManiSkill Hard Lighting Direction 3 | Success Rate52 | 7 | |
| PullCube | ManiSkill3 Medium Camera FOV v1 (test) | Success Rate19 | 7 |