Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

About

Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning. Specifically, PlayVirtual predicts future states in the latent space based on the current state and action by a dynamics model and then predicts the previous states by a backward dynamics model, which forms a trajectory cycle. Based on this, we augment the actions to generate a large amount of virtual state-action trajectories. Being free of groudtruth state supervision, we enforce a trajectory to meet the cycle consistency constraint, which can significantly enhance the data efficiency. We validate the effectiveness of our designs on the Atari and DeepMind Control Suite benchmarks. Our method achieves the state-of-the-art performance on both benchmarks.

Tao Yu, Cuiling Lan, Wenjun Zeng, Mingxiao Feng, Zhizheng Zhang, Zhibo Chen• 2021

Related benchmarks

TaskDatasetResultRank
Continuous ControlDMControl 500k
Spin Score963
33
Continuous ControlDMControl 100k
DMControl: Finger Spin Score915
29
Reinforcement LearningAtari 100k
Alien Score947.8
18
Visual Reinforcement LearningDMControl Finger, Spin
Episode Return915
16
Visual Reinforcement LearningDMControl Cheetah Run
Episode Return474
16
Visual Reinforcement LearningDMControl Ball in cup, Catch
Episode Return929
16
Visual Reinforcement LearningDMControl Cartpole, Swingup
Episode Return816
16
Visual Reinforcement LearningDMControl Reacher Easy
Episode Return785
16
Visual Reinforcement LearningDMControl Walker Walk
Episode Return460
16
Showing 9 of 9 rows

Other info

Follow for update