Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion

About

Achieving versatile humanoid locomotion with a single policy presents a critical scalability challenge. Prevailing methods often rely on distilling multiple terrain-specific teacher policies into a unified student policy. However, while such distillation captures basic locomotion primitives, it struggles to organically compose these skills to adapt to complex environments, resulting in poor generalization to novel composite terrains unseen during training. To overcome this, we present DreamPolicy, a unified framework that integrates offline data with a diffusion-based world model, enabling a single policy to master both known and unseen terrains. Central to our approach is a terrain-aware world model, driven by an autoregressive diffusion world model trained on aggregated rollouts from specialized policies. This model synthesizes physically plausible future trajectories, which serve as dynamic objectives for a conditioned policy, thereby bypassing manual reward engineering. Unlike distillation, our world model captures generalizable locomotion skills, allowing for robust zero-shot transfer to unseen composite terrains. DreamPolicy naturally scales with data availability. As the offline dataset expands, the diffusion world model continuously acquires richer skills. Experiments demonstrate that DreamPolicy outperforms the strongest baseline by up to 27\% on unseen terrains and 38\% on combined terrains. By unifying world model-based planning and policy learning, DreamPolicy breaks the "one task, one policy" bottleneck and establishes a scalable, data-driven paradigm for generalist humanoid control.

Yahao Fan, Tianxiang Gui, Kaiyang Ji, Shutong Ding, Chixuan Zhang, Yifeng Xu, Ke Yang, Jiayuan Gu, Jingyi Yu, Jingya Wang, Ye Shi• 2025

Related benchmarks

TaskDatasetResultRank
Humanoid LocomotionSlope bridge unseen (test)
Success Rate100
8
Humanoid LocomotionBalancing Beam unseen (test)
Success Rate96.35
4
Humanoid LocomotionWave unseen (test)
Success Rate100
4
LocomotionStair&Uneven combined terrain (test)
Success Rate88.46
2
LocomotionBridge&Uneven combined terrain (test)
Success Rate84.78
2
LocomotionStair&Gap combined terrain (test)
Success Rate95.79
2
LocomotionStair&Bridge combined terrain (test)
Success Rate91.45
2
LocomotionBridge&Gap combined terrain (test)
Success Rate94.68
2
Showing 8 of 8 rows

Other info

Follow for update