Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

About

Imitation learning has emerged as a promising approach towards building generalist robots. However, scaling imitation learning for large robot foundation models remains challenging due to its reliance on high-quality expert demonstrations. Meanwhile, large amounts of video data depicting a wide range of environments and diverse behaviors are readily available. This data provides a rich source of information about real-world dynamics and agent-environment interactions. Leveraging this data directly for imitation learning, however, has proven difficult due to the lack of action annotation. In this work, we present Unified World Models (UWM), a framework that allows for leveraging both video and action data for policy learning. Specifically, a UWM integrates an action diffusion process and a video diffusion process within a unified transformer architecture, where independent diffusion timesteps govern each modality. By controlling each diffusion timestep, UWM can flexibly represent a policy, a forward dynamics, an inverse dynamics, and a video generator. Through simulated and real-world experiments, we show that: (1) UWM enables effective pretraining on large-scale multitask robot datasets with both dynamics and action predictions, resulting in more generalizable and robust policies than imitation learning, (2) UWM naturally facilitates learning from action-free video data through independent control of modality-specific diffusion timesteps, further improving the performance of finetuned policies. Our results suggest that UWM offers a promising step toward harnessing large, heterogeneous datasets for scalable robot learning, and provides a simple unification between the often disparate paradigms of imitation learning and world modeling. Videos and code are available at https://weirdlabuw.github.io/uwm/.

Chuning Zhu, Raymond Yu, Siyuan Feng, Benjamin Burchfiel, Paarth Shah, Abhishek Gupta• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement97.6
957
Robotic ManipulationLIBERO
Spatial Success Rate82.3
527
Robotic ManipulationCalvin ABC->D
Task-1 Score81.3
71
Robotic ManipulationRoboCasa
Average Success Rate60.8
39
Robot ManipulationRoboTwin Clean 2.0
Average Success Rate81.7
39
Robot ManipulationRoboTwin Randomized 2.0--
33
Robotic Tabletop ManipulationRoboCasa GR1 Tabletop Tasks
Average Success Rate20
28
Tabletop manipulationLIBERO
Success Rate79
17
Robot ManipulationRoboCasa-GR1 24 tasks
Average Success Rate60.8
16
Kitchen manipulationRoboCasa 24 kitchen manipulation tasks
Average Success Rate60.8
12
Showing 10 of 22 rows

Other info

Follow for update