DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

About

World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. The proposed DriveDreamer is the first world model established from real-world driving scenarios. We instantiate DriveDreamer on the challenging nuScenes benchmark, and extensive experiments verify that DriveDreamer empowers precise, controllable video generation that faithfully captures the structural constraints of real-world traffic scenarios. Additionally, DriveDreamer enables the generation of realistic and reasonable driving policies, opening avenues for interaction and practical applications.

Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu• 2023

Related benchmarks

Task	Dataset	Result
3D Object Detection	nuScenes (val)	NDS39.5	981
Open-loop planning	nuScenes (val)	--	225
Video Generation	nuScenes (val)	FVD340.8	101
Planning	nuScenes (val)	Collision Rate (Avg)15	97
Multi-view video generation	nuScenes (val)	FID14.9	39
Driving Scene Generation	nuScenes (val)	FID14.9	27
Video Prediction	nuScenes (val)	FID14.9	24
Video Generation	nuScenes	FVD340.8	20
Frame prediction	nuScenes	FID52.6	16
Camera Generation	nuScenes v1.0-trainval (val)	FID52.6	11

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord