Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

About

A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the DriveVLM-Dual on a production vehicle, verifying it is effective in real-world autonomous driving environments.

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Open-loop planningnuScenes (val)
L2 Error (3s)0.68
225
Open-loop planningnuScenes
L2 Error (Avg)0.31
121
PlanningnuScenes (val)--
97
Open-loop planningnuScenes v1.0 (val)
L2 (1s)0.15
71
Trajectory PlanningnuScenes
L2 Error (m) (1s)0.15
58
Open-loop planningNuScenes v1.0 (test)
L2 Error (1s)0.15
50
PlanningnuScenes v1.0-trainval (val)
ST-P3 L2 Error (1s)0.15
39
Open-loop trajectory predictionNuScenes v1.0 (test)
L2 Error (1s)0.15
29
End-to-end PlanningnuScenes (open-loop)
L2 Error (1s)0.15
24
Open-loop planningnuScenes open-loop evaluation
L2 Error (1s) (m)0.15
18
Showing 10 of 19 rows

Other info

Follow for update