Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

About

Vision-Language Models suffer severe KV cache pressure at inference, as a single image often encodes into thousands of tokens. Most existing methods exploit token sparsity through token pruning, but permanently discarding visual content causes substantial degradation on fine-grained perception tasks. This motivates a complementary axis, feature sparsity: under a fixed KV cache budget, compressing the channel dimension preserves more visual tokens at the same memory cost. Prior Key channel pruning methods, however, face a structural trade-off: token-wise channel pruning is expressive but unstructured and slow, while head-wise approach is hardware-friendly but less robust. We resolve this with RotateK, a rotation-based structured Key channel pruning framework. RotateK applies an online PCA-based rotation that aligns token-dependent channel importance into a shared low-dimensional subspace, enabling accurate pruning under lightweight head-wise masks; a fused Triton attention kernel operates directly on sparse-channel Keys for efficient decoding. Experiments on two representative VLM backbones show that RotateK consistently outperforms prior Key channel pruning in both accuracy and decoding latency, while joint token-channel pruning improves over token-only baselines at matched KV cache budgets.

Beomseok Kang, Dongwon Jo, Jiwon Song, Donghwee Son, Jae-Joon Kim• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy70.32
1820
Visual Question AnsweringChartQA
Accuracy79.56
519
Visual Question AnsweringTextVQA
TextVQA Accuracy82.38
210
Visual Question AnsweringDocVQA
Accuracy91.29
205
Visual Question AnsweringInfoVQA
Accuracy70.51
195
Visual Question AnsweringVisual Question Answering Evaluation Suite TVQA, InfoVQA, ChartQA, DocVQA, VizWiz
TVQA Accuracy82.38
26
Open-ended generationLLaVA-Bench In-the-Wild
Score106.9
14
Open-ended generationMM-Vet
MM-Vet Score44.04
14
Showing 8 of 8 rows

Other info

Follow for update