Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

About

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. CIC substantially improves over prior methods in terms of adaptation efficiency, outperforming prior unsupervised skill discovery methods by 1.79x and the next leading overall exploration algorithm by 1.18x.

Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel• 2022

Related benchmarks

TaskDatasetResultRank
RunURLB Walker 1.0 (test)
Mean Score535
12
StandURLB Walker 1.0 (test)
Mean Score968
12
Bottom LeftURLB Jaco 1.0 (test)
Mean Score147
12
Bottom RightURLB Jaco 1.0 (test)
Mean Score150
12
FlipURLB Walker 1.0 (test)
Mean Score715
12
Top LeftURLB Jaco 1.0 (test)
Mean Score145
12
WalkURLB Walker 1.0 (test)
Mean Score914
12
JumpURLB Quadruped 1.0 (test)
Mean Score541
12
StandURLB Quadruped 1.0 (test)
Mean Score717
12
Unsupervised Reinforcement LearningURL Benchmark (Walker)
Flip Score218
12
Showing 10 of 17 rows

Other info

Follow for update