Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

About

Video large language models (VideoLLM) excel at video understanding, but face efficiency challenges due to the quadratic complexity of abundant visual tokens. Our systematic analysis of token compression methods for VideoLLMs reveals two critical issues: (i) overlooking distinctive visual signals across frames, leading to information loss; (ii) suffering from implementation constraints, causing incompatibility with modern architectures or efficient operators. To address these challenges, we distill three design principles for VideoLLM token compression and propose a plug-and-play inference acceleration framework "Video Compression Commander" (VidCom2). By quantifying each frame's uniqueness, VidCom2 adaptively adjusts compression intensity across frames, effectively preserving essential information while reducing redundancy in video sequences. Extensive experiments across various VideoLLMs and benchmarks demonstrate the superior performance and efficiency of our VidCom2. With only 25% visual tokens, VidCom2 achieves 99.6% of the original performance on LLaVA-OV while reducing 70.8% of the LLM generation latency. Notably, our Frame Compression Adjustment strategy is compatible with other token compression methods to further improve their performance. Our code is available at https://github.com/xuyang-liu16/VidCom2.

Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Video Understanding	MVBench	Accuracy53.4	635
Video Understanding	VideoMME	Score (Overall)68.1	369
Long Video Understanding	LongVideoBench	Score61	290
Long Video Understanding	MLVU	--	265
Video Understanding	MLVU	Score58.6	233
Video Understanding	VideoMME	Overall Score64.7	222
Video Understanding	MLVU	Accuracy59.7	147
Video Understanding	LongVideoBench	Accuracy53.7	128
Long Video Understanding	LongVideo-Bench	Score59.6	99
Video Understanding	LVBench	Overall Accuracy102.1	95

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord