DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

About

World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which builds upon the framework of DriveDreamer and incorporates a Large Language Model (LLM) to generate user-defined driving videos. Specifically, an LLM interface is initially incorporated to convert a user's query into agent trajectories. Subsequently, a HDMap, adhering to traffic regulations, is generated based on the trajectories. Ultimately, we propose the Unified Multi-View Model to enhance temporal and spatial coherence in the generated driving videos. DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner. Besides, experimental results demonstrate that the generated videos enhance the training of driving perception methods (e.g., 3D detection and tracking). Furthermore, video generation quality of DriveDreamer-2 surpasses other state-of-the-art methods, showcasing FID and FVD scores of 11.2 and 55.7, representing relative improvements of 30% and 50%.

Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang• 2024

Related benchmarks

Task	Dataset	Result
3D Multi-Object Tracking	nuScenes (val)	AMOTA31.3	144
Video Generation	nuScenes (val)	FVD55.7	72
Video Prediction	nuScenes (val)	FID25	24
Driving Scene Generation	nuScenes (val)	FID25	20
Video Generation	nuScenes	FVD55.7	17
Camera Generation	nuScenes v1.0-trainval (val)	FID25	11
Camera Generation	nuScenes (val)	FID11.2	10
Multi-camera driving video generation	Self-collected real-world driving dataset	Weather Accuracy74.3	6
Controllable Driving Video Generation	nuScenes and self-collected dataset (val)	FVD110.3	5
Generative World Model Attack Evaluation	DriveDreamer v2 (test)	FID18.4	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord