Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

About

A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. Experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the efficacy of DriveVLM and DriveVLM-Dual in handling complex and unpredictable driving conditions. Finally, we deploy the DriveVLM-Dual on a production vehicle, verifying it is effective in real-world autonomous driving environments.

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Open-loop planningnuScenes v1.0 (val)
L2 (1s)0.15
59
PlanningnuScenes v1.0-trainval (val)
ST-P3 L2 Error (1s)0.15
39
Open-loop trajectory predictionNuScenes v1.0 (test)
L2 Error (1s)0.15
29
Open-loop planningNuScenes v1.0 (test)
L2 Error (1s)0.15
28
Open-loop planningnuScenes
L2 Error (1s)0.15
20
Trajectory PlanningnuScenes
ST-P3 L2 Error (1s)0.18
12
Motion PlanningnuScenes
ST-P3 Collision (1s)0.1
11
End-to-end Motion PlanningnuScenes v1.0 (val)
ST-P3 Collision Rate (1s)0.1
9
Trajectory PredictionRoboDriveBench 1.0 (test)
L2 Error (Clean)0.69
7
Collision Robustness EvaluationRoboDriveBench
Clean Avg Collision0.29
7
Showing 10 of 12 rows

Other info

Follow for update