CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery
About
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluate our algorithm on the Unsupervised Reinforcement Learning Benchmark, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. CIC substantially improves over prior methods in terms of adaptation efficiency, outperforming prior unsupervised skill discovery methods by 1.79x and the next leading overall exploration algorithm by 1.18x.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Run | URLB Walker 1.0 (test) | Mean Score535 | 12 | |
| Stand | URLB Walker 1.0 (test) | Mean Score968 | 12 | |
| Bottom Left | URLB Jaco 1.0 (test) | Mean Score147 | 12 | |
| Bottom Right | URLB Jaco 1.0 (test) | Mean Score150 | 12 | |
| Flip | URLB Walker 1.0 (test) | Mean Score715 | 12 | |
| Top Left | URLB Jaco 1.0 (test) | Mean Score145 | 12 | |
| Walk | URLB Walker 1.0 (test) | Mean Score914 | 12 | |
| Jump | URLB Quadruped 1.0 (test) | Mean Score541 | 12 | |
| Stand | URLB Quadruped 1.0 (test) | Mean Score717 | 12 | |
| Unsupervised Reinforcement Learning | URL Benchmark (Walker) | Flip Score218 | 12 |