Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving

About

Vision-Language Models (VLMs) have emerged as a promising paradigm in autonomous driving (AD), providing a unified framework for perception and decision-making. However, their real-world deployment is hindered by significant computational overhead when processing high-resolution, multi-view images. This complexity stems from the massive number of visual tokens, which increases inference latency and memory consumption due to the quadratic complexity of self-attention. To address these challenges, we propose Prune2Drive, a plug-and-play visual token pruning framework for multi-view VLMs in AD. Prune2Drive introduces two core innovations: (i) a diversity-aware token selection mechanism that prioritizes semantic and spatial coverage across views, and (ii) a view-adaptive pruning controller that automatically learns optimal pruning ratios based on camera importance to downstream tasks. Unlike prior methods, Prune2Drive requires no model retraining or access to attention maps, ensuring compatibility with modern efficient attention implementations. Extensive experiments on the DriveLM and DriveLMM-o1 benchmarks demonstrate that Prune2Drive achieves significant speedups and memory savings with minimal performance impact. When retaining only 10% of visual tokens, our method achieves a 6.40x speedup in the prefilling phase and consumes only 13.4% of the original FLOPs, with a mere 3% average performance drop on the DriveLM benchmark. Code is available at: https://github.com/MinhaoXiong/Prune2Drive.git

Minhao Xiong, Zichen Wen, Zhuangcheng Gu, Xuyang Liu, Rui Zhang, Hengrui Kang, Jiabing Yang, Junyuan Zhang, Weijia Li, Conghui He, Yafei Wang, Linfeng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	--	2019
Visual Question Answering	VizWiz	Accuracy55.7	1820
Multimodal Evaluation	MME	--	727
Driving VQA	DriveLM (test)	Accuracy80	11
Reasoning	DriveLMM-o1 NuScenes	Risk Assessment Accuracy69.28	11
Visual Question Answering	GQA	BD Metric59.2	10
Reasoning and Generation	DriveLM (test)	Accuracy80	5
Video Autonomous Driving	OmniDrive	BLEU38.6	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord