Constrained Ensemble Exploration for Unsupervised Skill Discovery

About

Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this paper, we propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes. Thus, each skill can explore the clustered area locally, and the ensemble skills maximize the overall state coverage. We adopt state-distribution constraints for the skill occupancy and the desired cluster for learning distinguishable skills. Theoretical analysis is provided for the state entropy and the resulting skill distributions. Based on extensive experiments on several challenging tasks, we find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods.

Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li• 2024

Related benchmarks

Task	Dataset	Result
State Exploration	Maze2D Square-b	State Coverage Ratio66	22
Reinforcement Learning	Jaco URLB (downstream)	Reach Count BL136	12
Reinforcement Learning	Walker URLB (downstream)	Flip Success Score623	12
Reinforcement Learning	Quadruped URLB (downstream)	Jump Score529	12
State Exploration	Maze2D Square-a	State Coverage Ratio71	11
State Exploration	Maze2D Square-c	State Coverage Ratio60	11
State Exploration	Maze2D Square-d	State Coverage Ratio0.57	11
State Exploration	Maze2D Corridor2	State Coverage Ratio82	11
State Exploration	Maze2D Square-tree	State Coverage Ratio40	11
Reinforcement Learning	URLB Overall	Sum5.60e+3	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord