FastVID: Dynamic Density Pruning for Fast Video Large Language Models

About

Video Large Language Models have demonstrated strong video understanding capabilities, yet their practical deployment is hindered by substantial inference costs caused by redundant video tokens. Existing pruning techniques fail to effectively exploit the spatiotemporal redundancy present in video data. To bridge this gap, we perform a systematic analysis of video redundancy from two perspectives: temporal context and visual context. Leveraging these insights, we propose Dynamic Density Pruning for Fast Video LLMs termed FastVID. Specifically, FastVID dynamically partitions videos into temporally ordered segments to preserve temporal structure and applies a density-based token pruning strategy to maintain essential spatial and temporal information. Our method significantly reduces computational overhead while maintaining temporal and visual integrity. Extensive evaluations show that FastVID achieves state-of-the-art performance across various short- and long-video benchmarks on leading Video LLMs, including LLaVA-OneVision, LLaVA-Video, Qwen2-VL, and Qwen2.5-VL. Notably, on LLaVA-OneVision-7B, FastVID effectively prunes $\textbf{90.3%}$ of video tokens, reduces FLOPs to $\textbf{8.3%}$, and accelerates the LLM prefill stage by $\textbf{7.1}\times$, while maintaining $\textbf{98.0%}$ of the original accuracy. The code is available at https://github.com/LunarShen/FastVID.

Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding• 2025

Related benchmarks

Task	Dataset	Result
Video Understanding	MVBench	Accuracy61.2	563
Video Question Answering	ActivityNet-QA	Accuracy49.6	418
Video Understanding	VideoMME	Score (Overall)63.7	357
Long Video Understanding	LongVideoBench	Score60.9	269
Video Question Answering	VideoMME	Accuracy59	251
Video Understanding	VideoMME	Overall Score63.7	222
Video Understanding	MLVU	Score67.31	221
Long Video Understanding	MLVU	--	205
Video Question Answering	MLVU	Accuracy61.1	194
Video Understanding	EgoSchema	EgoSchema Score58.8	185

Showing 10 of 60 rows

Other info

Follow for update

@wizwand_team Discord