DDP-WM: Disentangled Dynamics Prediction for Efficient World Models
About
World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployment. To address this efficiency-performance bottleneck, we introduce DDP-WM, a novel world model centered on the principle of Disentangled Dynamics Prediction (DDP). We hypothesize that latent state evolution in observed scenes is heterogeneous and can be decomposed into sparse primary dynamics driven by physical interactions and secondary context-driven background updates. DDP-WM realizes this decomposition through an architecture that integrates efficient historical processing with dynamic localization to isolate primary dynamics. By employing a crossattention mechanism for background updates, the framework optimizes resource allocation and provides a smooth optimization landscape for planners. Extensive experiments demonstrate that DDP-WM achieves significant efficiency and performance across diverse tasks, including navigation, precise tabletop manipulation, and complex deformable or multi-body interactions. Specifically, on the challenging Push-T task, DDP-WM achieves an approximately 9 times inference speedup and improves the MPC success rate from 90% to98% compared to state-of-the-art dense models. The results establish a promising path for developing efficient, high-fidelity world models. Codes will be available at https://github.com/HCPLab-SYSU/DDP-WM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Navigation | PointMaze | Success Rate100 | 21 | |
| Robotic Control | PushT | Time (s)16 | 14 | |
| Table-top manipulation | Push T | Success Rate98 | 5 | |
| 2D Navigation | Wall | Success Rate98 | 5 | |
| Deformable body manipulation | Rope | Chamfer Distance0.31 | 4 | |
| Multi-body system manipulation | Granular | Contact Distance0.24 | 4 | |
| Dynamics Prediction | Push T | Throughput (samples/sec)1.56e+3 | 2 | |
| Dynamics Prediction | Wall | Throughput (samples/sec)2.17e+3 | 2 | |
| MPC decision loop | PointMaze | Decision Loop Time (s)5.5 | 2 | |
| MPC decision loop | Wall | Decision Loop Time (s)4.2 | 2 |