Generalization in Reinforcement Learning by Soft Data Augmentation

About

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation. However, as more factors of variation are introduced during training, optimization becomes increasingly challenging, and empirically may result in lower sample efficiency and unstable training. Instead of learning policies directly from augmented data, we propose SOft Data Augmentation (SODA), a method that decouples augmentation from policy learning. Specifically, SODA imposes a soft constraint on the encoder that aims to maximize the mutual information between latent representations of augmented and non-augmented data, while the RL optimization process uses strictly non-augmented data. Empirical evaluations are performed on diverse tasks from DeepMind Control suite as well as a robotic manipulation task, and we find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.

Nicklas Hansen, Xiaolong Wang• 2020

Related benchmarks

Task	Dataset	Result
Continuous Control	DMC-GB video hard	Cartpole Swingup Score4.13e+4	18
Reinforcement Learning	DMC-GB2 Video Hard (test)	Cartpole Swingup Return429	15
Robotic Manipulation	peg-in-box (test2)	Return76	14
Continuous Control	DMC-GB video easy	Cartpole Swingup Score617	12
Visual Reinforcement Learning	DMC-GB Color Hard	Average Return: Walker, Walk692	10
Visual Reinforcement Learning	DMControl-GB Video-Easy	Walker Walk Score768	10
Continuous Control	DMControl-GB natural videos 1.0 (test)	Walker Walk768	8
Continuous Control	DMControl-GB random colors 1.0 (test)	Walker-Walk Score697	8
Visual Reinforcement Learning	DMControl VDCS Markov-temporal perturbations (test)	Cartpole Swingup Score615	8
Reach	Robotic Manipulation (Test1)	Episode Return30.9	7

Showing 10 of 56 rows

Other info

Follow for update

@wizwand_team Discord