Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decoupling Representation Learning from Reinforcement Learning

About

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.

Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin• 2020

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 100k steps (test)
Median HNS0.237
20
ReachMeta-World ML-1 (test)
Success Rate71.3
9
Autonomous DrivingCARLA Map 1 (Source)
Cumulative Reward2.28e+3
6
Point-Goal navigationAI2THOR Seen target domains
Success Rate89.1
6
Point-Goal navigationAI2THOR Unseen target domains
Success Rate81.9
6
Reachegocentric-Metaworld (Unseen Target)
Success Rate72
6
Autonomous DrivingCARLA Map 1 (Seen Target)
Sum of Rewards1.68e+3
6
Autonomous DrivingCARLA Map 2 (Source)
Cumulative Reward2.27e+3
6
Autonomous DrivingCARLA Map 2 (Seen Target)
Sum of Rewards2.25e+3
6
Object Goal NavigationAI2THOR Source domains
Success Rate82.2
6
Showing 10 of 22 rows

Other info

Follow for update