Contrastive Learning as Goal-Conditioned Reinforcement Learning

About

In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable and instead equip RL algorithms with additional representation learning parts (e.g., auxiliary losses, data augmentation). How can we design RL algorithms that directly acquire good representations? In this paper, instead of adding representation learning parts to an existing RL algorithm, we show (contrastive) representation learning methods can be cast as RL algorithms in their own right. To do this, we build upon prior work and apply contrastive representation learning to action-labeled trajectories, in such a way that the (inner product of) learned representations exactly corresponds to a goal-conditioned value function. We use this idea to reinterpret a prior RL method as performing contrastive learning, and then use the idea to propose a much simpler method that achieves similar performance. Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting. We also show that contrastive RL outperforms prior methods on image-based tasks, without using data augmentation or auxiliary objectives.

Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine• 2022

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	puzzle-4x4-play OGBench 5 tasks v0	Average Success Rate0.00e+0	28
Goal-conditioned manipulation	OGBench puzzle-4x4-play	Score0.00e+0	24
Goal-conditioned Reinforcement Learning	antmaze stitch medium	Success Rate69	23
Goal-conditioned Reinforcement Learning	antmaze stitch large	Success Rate13	23
Manipulation	OGBench cube-triple-play	Success Rate6	19
Offline Goal-Conditioned Reinforcement Learning	antmaze medium-navigate v0	Success Rate95	14
Offline Goal-Conditioned Reinforcement Learning	humanoidmaze large-navigate v0	Success Rate24	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch medium	Success Rate40	14
Goal-conditioned Reinforcement Learning	humanoidmaze stitch large	Success Rate4	14
Goal-conditioned Reinforcement Learning	manipulation scene-play	Success Rate11	14

Showing 10 of 202 rows

...

Other info

Code

Follow for update

@wizwand_team Discord