Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FastVID: Dynamic Density Pruning for Fast Video Large Language Models

About

Video Large Language Models have demonstrated strong video understanding capabilities, yet their practical deployment is hindered by substantial inference costs caused by redundant video tokens. Existing pruning techniques fail to effectively exploit the spatiotemporal redundancy present in video data. To bridge this gap, we perform a systematic analysis of video redundancy from two perspectives: temporal context and visual context. Leveraging these insights, we propose Dynamic Density Pruning for Fast Video LLMs termed FastVID. Specifically, FastVID dynamically partitions videos into temporally ordered segments to preserve temporal structure and applies a density-based token pruning strategy to maintain essential spatial and temporal information. Our method significantly reduces computational overhead while maintaining temporal and visual integrity. Extensive evaluations show that FastVID achieves state-of-the-art performance across various short- and long-video benchmarks on leading Video LLMs, including LLaVA-OneVision, LLaVA-Video, Qwen2-VL, and Qwen2.5-VL. Notably, on LLaVA-OneVision-7B, FastVID effectively prunes $\textbf{90.3%}$ of video tokens, reduces FLOPs to $\textbf{8.3%}$, and accelerates the LLM prefill stage by $\textbf{7.1}\times$, while maintaining $\textbf{98.0%}$ of the original accuracy. The code is available at https://github.com/LunarShen/FastVID.

Leqi Shen, Guoqiang Gong, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Guiguang Ding• 2025

Related benchmarks

TaskDatasetResultRank
Video UnderstandingMVBench
Accuracy61.2
425
Video Question AnsweringActivityNet-QA
Accuracy49.6
376
Video UnderstandingVideoMME
Score (Long)55.2
248
Long Video UnderstandingLongVideoBench
Score60.9
248
Video UnderstandingVideoMME
Overall Score63.7
222
Video UnderstandingMLVU
Score67.31
221
Video UnderstandingEgoSchema
EgoSchema Score58.8
158
Long Video UnderstandingMLVU
Score68.2
154
Video UnderstandingLongVideoBench
LongVideoBench Score57.8
92
Video UnderstandingVideo-MME
Overall Score60.89
92
Showing 10 of 36 rows

Other info

Follow for update