Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference

About

Vision-Language Models suffer severe KV cache pressure at inference, as a single image often encodes into thousands of tokens. Most existing methods exploit token sparsity through token pruning, but permanently discarding visual content causes substantial degradation on fine-grained perception tasks. This motivates a complementary axis, feature sparsity: under a fixed KV cache budget, compressing the channel dimension preserves more visual tokens at the same memory cost. Prior Key channel pruning methods, however, face a structural trade-off: token-wise channel pruning is expressive but unstructured and slow, while head-wise approach is hardware-friendly but less robust. We resolve this with RotateK, a rotation-based structured Key channel pruning framework. RotateK applies an online PCA-based rotation that aligns token-dependent channel importance into a shared low-dimensional subspace, enabling accurate pruning under lightweight head-wise masks; a fused Triton attention kernel operates directly on sparse-channel Keys for efficient decoding. Experiments on two representative VLM backbones show that RotateK consistently outperforms prior Key channel pruning in both accuracy and decoding latency, while joint token-channel pruning improves over token-only baselines at matched KV cache budgets.

Beomseok Kang, Dongwon Jo, Jiwon Song, Donghwee Son, Jae-Joon Kim• 2026

Related benchmarks

Task	Dataset	Result
Visual Question Answering	VizWiz	Accuracy70.32	1863
Visual Question Answering	ChartQA	Accuracy79.56	620
Visual Question Answering	InfoVQA	Accuracy70.51	264
Visual Question Answering	TextVQA	TextVQA Accuracy82.38	210
Visual Question Answering	DocVQA	Accuracy91.29	205
Visual Question Answering	Visual Question Answering Evaluation Suite TVQA, InfoVQA, ChartQA, DocVQA, VizWiz	TVQA Accuracy82.38	26
Open-ended generation	LLaVA-Bench In-the-Wild	Score106.9	14
Open-ended generation	MM-Vet	MM-Vet Score44.04	14

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord