DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion

About

Achieving versatile humanoid locomotion with a single policy presents a critical scalability challenge. Prevailing methods often rely on distilling multiple terrain-specific teacher policies into a unified student policy. However, while such distillation captures basic locomotion primitives, it struggles to organically compose these skills to adapt to complex environments, resulting in poor generalization to novel composite terrains unseen during training. To overcome this, we present DreamPolicy, a unified framework that integrates offline data with a diffusion-based world model, enabling a single policy to master both known and unseen terrains. Central to our approach is a terrain-aware world model, driven by an autoregressive diffusion world model trained on aggregated rollouts from specialized policies. This model synthesizes physically plausible future trajectories, which serve as dynamic objectives for a conditioned policy, thereby bypassing manual reward engineering. Unlike distillation, our world model captures generalizable locomotion skills, allowing for robust zero-shot transfer to unseen composite terrains. DreamPolicy naturally scales with data availability. As the offline dataset expands, the diffusion world model continuously acquires richer skills. Experiments demonstrate that DreamPolicy outperforms the strongest baseline by up to 27\% on unseen terrains and 38\% on combined terrains. By unifying world model-based planning and policy learning, DreamPolicy breaks the "one task, one policy" bottleneck and establishes a scalable, data-driven paradigm for generalist humanoid control.

Yahao Fan, Tianxiang Gui, Kaiyang Ji, Shutong Ding, Chixuan Zhang, Yifeng Xu, Ke Yang, Jiayuan Gu, Jingyi Yu, Jingya Wang, Ye Shi• 2025

Related benchmarks

Task	Dataset	Result
Humanoid Locomotion	Slope bridge unseen (test)	Success Rate100	8
Humanoid Locomotion	Balancing Beam unseen (test)	Success Rate96.35	4
Humanoid Locomotion	Wave unseen (test)	Success Rate100	4
Locomotion	Stair&Uneven combined terrain (test)	Success Rate88.46	2
Locomotion	Bridge&Uneven combined terrain (test)	Success Rate84.78	2
Locomotion	Stair&Gap combined terrain (test)	Success Rate95.79	2
Locomotion	Stair&Bridge combined terrain (test)	Success Rate91.45	2
Locomotion	Bridge&Gap combined terrain (test)	Success Rate94.68	2

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord