Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

About

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion.These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V.

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Driving Visual Question AnsweringnuScenes-QA
QA EM52.6
10
Question AnsweringWaymo-QA
QA EM49.4
10
Driving decision-makingBDD-X
BLEU-40.3
8
Driving Action JustificationBDD-X (test)
B4 Score9.4
7
Driving Action ExplanationBDD-X (test)
B4 Score30
7
Cross-Modal Conflict Resolution and Scene ConsistencynuScenes-QA
CRR4.1
6
Driving ReasoningBDD-X (All)
BLEU-40.183
6
Driving ReasoningBDD-X Easy
BLEU-40.204
6
Driving ReasoningBDD-X Medium
BLEU-416.9
6
Driving ReasoningBDD-X Hard
BLEU-412.3
6
Showing 10 of 12 rows

Other info

Follow for update