Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

About

One of the key challenges in visual Reinforcement Learning (RL) is to learn policies that can generalize to unseen environments. Recently, data augmentation techniques aiming at enhancing data diversity have demonstrated proven performance in improving the generalization ability of learned policies. However, due to the sensitivity of RL training, naively applying data augmentation, which transforms each pixel in a task-agnostic manner, may suffer from instability and damage the sample efficiency, thus further exacerbating the generalization performance. At the heart of this phenomenon is the diverged action distribution and high-variance value estimation in the face of augmented images. To alleviate this issue, we propose Task-aware Lipschitz Data Augmentation (TLDA) for visual RL, which explicitly identifies the task-correlated pixels with large Lipschitz constants, and only augments the task-irrelevant pixels. To verify the effectiveness of TLDA, we conduct extensive experiments on DeepMind Control suite, CARLA and DeepMind Manipulation tasks, showing that TLDA improves both sample efficiency in training time and generalization in test time. It outperforms previous state-of-the-art methods across the 3 different visual control benchmarks.

Zhecheng Yuan, Guozheng Ma, Yao Mu, Bo Xia, Bo Yuan, Xueqian Wang, Ping Luo, Huazhe Xu• 2022

Related benchmarks

Task	Dataset	Result
Continuous Control	DMC-GB video hard	Cartpole Swingup Score286	18
Continuous Control	DMC-GB video easy	Cartpole Swingup Score671	12
Cheetah Run	DMC-GB color-jittered (test)	Average Return371	6
Cartpole Swingup	DMC-GB color-jittered (test)	Average Return760	6
Manipulation	DeepMind Manipulation tasks Modified Platform	Average Return89	6
Manipulation	DeepMind Manipulation tasks Modified Both	Average Return36	6
Walker Stand	DMC-GB color-jittered (test)	Average Return947	6
Walker Walk	DMC-GB color-jittered (test)	Average Return823	6
Ball In Cup Catch	DMC-GB color-jittered (test)	Average Return932	6
Manipulation	DeepMind Manipulation tasks Modified Arm	Average Return55	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord