SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

About

Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Existing visual token pruning methods primarily focus on selecting the most salient tokens based on attention scores, resulting in the semantic incompleteness of the selected tokens. In this paper, we propose a novel visual token pruning strategy, called \textbf{S}aliency-\textbf{C}overage \textbf{O}riented token \textbf{P}runing for \textbf{E}fficient MLLMs (SCOPE), to jointly model both the saliency and coverage of the selected visual tokens to better preserve semantic completeness. Specifically, we introduce a set-coverage for a given set of selected tokens, computed based on the token relationships. We then define a token-coverage gain for each unselected token, quantifying how much additional coverage would be obtained by including it. By integrating the saliency score into the token-coverage gain, we propose our SCOPE score and iteratively select the token with the highest SCOPE score. We conduct extensive experiments on multiple vision-language understanding benchmarks using the LLaVA-1.5 and LLaVA-Next models. Experimental results demonstrate that our method consistently outperforms prior approaches. Our code is available at \href{https://github.com/kinredon/SCOPE}{https://github.com/kinredon/SCOPE}.

Jinhong Deng, Wen Li, Joey Tianyi Zhou, Yang He• 2025

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	--	2019
Visual Question Answering	GQA	Accuracy60.1	1425
Multimodal Understanding	MMBench	--	847
Multimodal Evaluation	MME	Score1.68e+3	727
Visual Question Answering	ChartQA	--	519
Diagram Question Answering	AI2D	AI2D Accuracy78.21	387
Chart Question Answering	ChartQA	--	371
Document Visual Question Answering	DocVQA	ANLS85.4	301
Multimodal Understanding	MMBench CN	--	254
Video Understanding	VideoMME	Overall Score86.4	222

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord