Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reinforcement Learning with Augmented Data

About

Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks. Our RAD module and training code are available at https://www.github.com/MishaLaskin/rad.

Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas• 2020

Related benchmarks

TaskDatasetResultRank
Point-Goal navigationGibson (held-out scenes)
Average SR (All Scenes)1.16e+3
30
ControlDMControl
DMControl: Ball in Cup Catch Score879.9
29
PointGoal NavigationiGibson Ihlen 0 int 1.0 (test)
SR48.8
22
PointGoal NavigationiGibson Rs int 1.0 (test)
Success Rate4.85e+3
22
PointGoal NavigationiGibson Env Avg 1.0 (test)
SR3.63e+3
22
Continuous ControlDMC-GB video hard
Cartpole Swingup Score1.52e+4
18
Reinforcement LearningDMControl
Hopper/Hop Error0.024
13
Pixel-based ControlDeepMind Control Suite 100k steps
Cheetah/Run Score419
9
Pixel-based ControlDeepMind Control Suite 500k environment steps
Cheetah Run Score548
9
Reinforcement LearningDMControl Finger, spin (100k steps)
Total Reward856
7
Showing 10 of 38 rows

Other info

Follow for update