From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
About
Despite remarkable progress in driving world models, their potential for autonomous systems remains largely untapped: the world models are mostly learned for world simulation and decoupled from trajectory planning. While recent efforts aim to unify world modeling and planning in a single framework, the synergistic facilitation mechanism of world modeling for planning still requires further exploration. In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not only integrates world modeling and trajectory planning within a unified architecture, but is also able to benefit planning using the learned world knowledge through the proposed action-free future state forecasting scheme. Through collaborative state-action prediction, PWM can mimic the human-like anticipatory perception, yielding more reliable planning performance. To facilitate the efficiency of video forecasting, we further introduce a dynamically enhanced parallel token generation mechanism, equipped with a context-guided tokenizer and an adaptive dynamic focal loss. Despite utilizing only front camera input, our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs. Code and model weights will be released at https://github.com/6550Zhao/Policy-World-Model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Autonomous Driving Planning | NAVSIM v1 | NC98.9 | 86 | |
| Autonomous Driving Planning | NAVSIM v1 (test) | NC98.6 | 59 | |
| Autonomous Driving | NAVSIM (test) | PDMS88.1 | 48 | |
| Autonomous Driving | NAVSIM (navtest) | PDMS88.1 | 26 | |
| End-to-end Motion Planning | nuScenes | L2 Displacement Error (1s)2.06 | 22 | |
| Closed-loop Planning | NAVSIM Navtest (test) | PDMS88.1 | 16 | |
| Video Generation | NAVSIM | FVD85.95 | 5 | |
| End-to-end Motion Planning | Bench2Drive CARLA | L2 Error (1s)1.7 | 3 |