ASAP: Attention-Shift-Aware Pruning for Efficient LVLM Inference
About
While Large Vision-Language Models (LVLMs) demonstrate exceptional multi-modal capabilities, the quadratic computational cost of processing high-resolution visual tokens remains a critical bottleneck. Though recent token reduction strategies attempt to accelerate inference, such methods inadequately exploit attention values and fail to address token redundancy. More critically, they overlook the ``attention shift'' phenomenon inherent in LVLMs, which skews token attention scores. In this work, we propose ASAP, a novel training-free, KV-Cache-compatible pruning recipe that comprehensively addresses these limitations. First, we mitigate the attention shift by utilizing a dynamic bidirectional soft attention mask, ensuring the selection of genuinely informative tokens rather than naive attention-based selection. Second, we posit that high semantic redundancy within the token set degrades performance. We therefore introduce a weighted soft merging component that merges semantically similar tokens, preserving only the most feature-dense visual patches for subsequent layers. ASAP achieves virtually lossless compression of visual context, retaining 99.02% of the original LLaVA-NeXT-7B performance while aggressively slashing computational FLOPs by ~80%.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | -- | 1455 | |
| Visual Question Answering | VQA v2 | Accuracy79.46 | 1362 | |
| Text-based Visual Question Answering | TextVQA | Accuracy60.69 | 807 | |
| Multimodal Evaluation | MME | Score1.55e+3 | 658 | |
| Science Question Answering | ScienceQA (SQA) | Accuracy72.89 | 273 | |
| Multimodal Benchmarking | MMBench CN | Score82.47 | 129 | |
| Visual Reasoning | GQA | Accuracy62.43 | 93 | |
| Multimodal Benchmark | MMBench (MMB) | Accuracy68.29 | 81 | |
| Visual Question Answering | GQA | GQA Score62.31 | 37 | |
| Multimodal Understanding | VQAv2, GQA, VQAText, MMB, MMVet | VQAv2 Accuracy81.19 | 7 |