Lipschitz-constrained Unsupervised Skill Discovery

About

We study the problem of unsupervised skill discovery, whose goal is to learn a set of diverse and useful skills with no external reward. There have been a number of skill discovery methods based on maximizing the mutual information (MI) between skills and states. However, we point out that their MI objectives usually prefer static skills to dynamic ones, which may hinder the application for downstream tasks. To address this issue, we propose Lipschitz-constrained Skill Discovery (LSD), which encourages the agent to discover more diverse, dynamic, and far-reaching skills. Another benefit of LSD is that its learned representation function can be utilized for solving goal-following downstream tasks even in a zero-shot manner - i.e., without further training or complex planning. Through experiments on various MuJoCo robotic locomotion and manipulation environments, we demonstrate that LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks including the challenging task of following multiple goals on Humanoid. Our code and videos are available at https://shpark.me/projects/lsd/.

Seohong Park, Jongwook Choi, Jaekyeom Kim, Honglak Lee, Gunhee Kim• 2022

Related benchmarks

Task	Dataset	Result
State Exploration	Maze2D Square-b	State Coverage Ratio43	22
State Exploration	Maze2D Square-a	State Coverage Ratio42	11
State Exploration	Maze2D Square-d	State Coverage Ratio0.45	11
State Exploration	Maze2D Square-tree	State Coverage Ratio28	11
State Exploration	Maze2D Square-c	State Coverage Ratio37	11
State Exploration	Maze2D Corridor2	State Coverage Ratio56	11
Safe Locomotion	Humanoid Hole	Safe State Ratio100	7
Downstream Task Performance	Ant North	Average Performance-2.02e+3	7
Hierarchical Control	Halfcheetah	Performance Score32.73	7
Downstream Task Performance	Ant Range	Average Performance-894.3	7

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord