Enhancing End-to-End Autonomous Driving with Latent World Model
About
In autonomous driving, end-to-end planners directly utilize raw sensor data, enabling them to extract richer scene features and reduce information loss compared to traditional planners. This raises a crucial research question: how can we develop better scene feature representations to fully leverage sensor data in end-to-end driving? Self-supervised learning methods show great success in learning rich feature representations in NLP and computer vision. Inspired by this, we propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving. LAW predicts future scene features based on current features and ego trajectories. This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks, improving scene feature learning and optimizing trajectory prediction. LAW achieves state-of-the-art performance across multiple benchmarks, including real-world open-loop benchmark nuScenes, NAVSIM, and simulator-based closed-loop benchmark CARLA. The code is released at https://github.com/BraveGroup/LAW.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open-loop planning | nuScenes v1.0 (val) | L2 (1s)0.31 | 59 | |
| Planning | nuScenes (val) | Collision Rate (Avg)19 | 52 | |
| Autonomous Driving Planning | NAVSIM (navtest) | NC97.2 | 50 | |
| Autonomous Driving | CARLA Town05 (Long) | DS70.1 | 46 | |
| Planning | nuScenes v1.0-trainval (val) | -- | 39 | |
| Autonomous Driving | NAVSIM (test) | PDMS84.6 | 34 | |
| Trajectory Prediction | nuScenes v1.0-trainval (val) | L2 Error (1s)0.24 | 23 | |
| Planning | NAVSIM (test) | PDMS84.6 | 22 | |
| Open-loop Autonomous Driving Planning | NAVSIM 1.0 (test) | NC96.4 | 19 | |
| End-to-end Planning | NAVSIM v1 | NC0.974 | 17 |