SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

About

Diffusion models have demonstrated strong potential for robotic trajectory planning. However, generating coherent trajectories from high-level instructions remains challenging, especially for long-range composition tasks requiring multiple sequential skills. We propose SkillDiffuser, an end-to-end hierarchical planning framework integrating interpretable skill learning with conditional diffusion planning to address this problem. At the higher level, the skill abstraction module learns discrete, human-understandable skill representations from visual observations and language instructions. These learned skill embeddings are then used to condition the diffusion model to generate customized latent trajectories aligned with the skills. This allows generating diverse state trajectories that adhere to the learnable skills. By integrating skill learning with conditional trajectory generation, SkillDiffuser produces coherent behavior following abstract instructions across diverse tasks. Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser. More visualization results and information could be found on our website.

Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo• 2023

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	Calvin ABCD→D	Avg Length3.66	139
Robot Manipulation	MimicGen	Coffee Success Rate77	25
Dexterous Hand Control	Adroit	Overall Avg Success Rate63	19
Robot Manipulation	RLBench single-view setup	Average Success Rate74.4	15
Tool-based Manipulation	DexArt	DexArt Avg Success Rate62	11
Robotic Manipulation	LOReL Sawyer Dataset (test)	Close Drawer Success Rate9.532	6
Robotic Manipulation	MimicGen (test)	Square Success Rate75	6
Robotic Manipulation	MimicGen	Square75	6
Robot Manipulation	LOReL Sawyer (unseen multi-step composition)	Success Rate25.21	4
Robot Manipulation	Real-world manipulation (ALOHA) zero-shot 1.0	Clean Cup30	4

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord