Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

About

The advances in vision-language models (VLMs) have led to a growing interest in autonomous driving to leverage their strong reasoning capabilities. However, extending these capabilities from 2D to full 3D understanding is crucial for real-world applications. To address this challenge, we propose OmniDrive, a holistic vision-language dataset that aligns agent models with 3D driving tasks through counterfactual reasoning. This approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions. Our counterfactual-based synthetic data annotation process generates large-scale, high-quality datasets, providing denser supervision signals that bridge planning trajectories and language-based reasoning. Futher, we explore two advanced OmniDrive-Agent frameworks, namely Omni-L and Omni-Q, to assess the importance of vision-language alignment versus 3D perception, revealing critical insights into designing effective LLM-agents. Significant improvements on the DriveLM Q\&A benchmark and nuScenes open-loop planning demonstrate the effectiveness of our dataset and methods.

Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez• 2024

Related benchmarks

TaskDatasetResultRank
Open-loop planningnuScenes (val)
L2 Error (3s)0.55
225
Open-loop planningnuScenes
L2 Error (Avg)0.33
121
PlanningnuScenes (val)
Collision Rate (Avg)30
97
Open-loop planningnuScenes v1.0 (val)
L2 (1s)0.14
71
Trajectory PlanningnuScenes
L2 Error (m) (1s)0.14
58
Open-loop planningNuScenes v1.0 (test)
L2 Error (1s)0.14
50
PlanningnuScenes v1.0-trainval (val)
ST-P3 L2 Error (1s)0.14
39
Visual Question AnsweringNuscenesQA
Accuracy59.2
33
Motion PlanningnuScenes v1.0 (val)
L2 Error (3s)0.55
29
Open-loop trajectory predictionNuScenes v1.0 (test)
L2 Error (1s)0.14
29
Showing 10 of 27 rows

Other info

Follow for update