Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

About

Omnimodal large language models (OmniLLMs) have recently gained increasing attention for unified audio-video understanding. However, processing long multimodal token sequences introduces substantial computational overhead, making efficient token compression crucial. Existing methods typically rely on fixed, modality-specific guidance, which fails to account for the varying importance of modalities across different queries. To address this limitation, we propose $\textbf{OmniSelect}$, a training-free, modality-adaptive token pruning framework that dynamically selects appropriate compression strategies for multimodal inputs. Specifically, we leverage a lightweight AudioCLIP model to estimate cross-modal relevance and categorize each input into three pruning regimes: Audio-Centric, Video-Centric, and Uniform pruning. Based on these relevance scores, OmniSelect further performs fine-grained token pruning within each temporal group, adaptively allocating pruning ratios to preserve informative tokens across modalities. By explicitly modeling modality preference and enabling dynamic strategy selection, OmniSelect effectively avoids the pitfalls of one-size-fits-all compression. Extensive experiments demonstrate that our method achieves efficient multimodal token reduction while maintaining strong performance, without requiring any additional training.

Morunliu Yang, Ruotao Xu, Le Li, Yue Wang, Jianxin Zhang, Juntao Li, Yihang Lou, Siwei Feng, Peifeng Li• 2026

Related benchmarks

TaskDatasetResultRank
Video UnderstandingVideoMME
Score (Overall)66.33
357
Audio-visual understandingDailyOmni
Average Score60.65
83
Omnimodal UnderstandingWorldSense v1.0 (test)
Tech & Science Score50.61
24
Audio-Video UnderstandingOmniVideoBench
Avg Latency32.7
23
Video Question AnsweringWorldSense
Accuracy (Tech & Science)45.51
10
Showing 5 of 5 rows

Other info

Follow for update