Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference

About

While Large Vision-Language Models (LVLMs) demonstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck. Though recent token reduction strategies attempt to accelerate inference, such methods inadequately exploit attention values and fail to address token redundancy. More critically, they overlook the ``attention shift'' phenomenon inherent in LVLMs, which skews token attention scores. In this work, we propose ASAP, a novel training-free, KV-Cache-compatible pruning recipe that comprehensively addresses these limitations. First, we mitigate the attention shift by utilizing a dynamic bidirectional soft attention mask, ensuring the selection of genuinely informative tokens rather than naive attention-based selection. Second, we posit that high semantic redundancy within the token set degrades performance. We therefore introduce a weighted soft merging component that merges semantically similar tokens, preserving only the most feature-dense visual patches for subsequent layers. ASAP achieves virtually lossless compression of visual context, retaining 99.02% of the original LLaVA-NeXT-7B performance while aggressively slashing computational FLOPs by ~80%.

Surendra Pathak, Bo Han• 2026

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE--
1455
Visual Question AnsweringVQA v2
Accuracy79.46
1362
Text-based Visual Question AnsweringTextVQA
Accuracy60.69
807
Multimodal EvaluationMME
Score1.55e+3
658
Science Question AnsweringScienceQA (SQA)
Accuracy72.89
273
Multimodal BenchmarkingMMBench CN
Score82.47
129
Visual ReasoningGQA
Accuracy62.43
93
Multimodal BenchmarkMMBench (MMB)
Accuracy68.29
81
Visual Question AnsweringGQA
GQA Score62.31
37
Multimodal UnderstandingVQAv2, GQA, VQAText, MMB, MMVet
VQAv2 Accuracy81.19
7
Showing 10 of 10 rows

Other info

Follow for update