xKV: Cross-Layer KV-Cache Compression via Aligned Singular Vector Extraction

About

Long-context Large Language Models (LLMs) enable powerful applications but incur high memory costs due to the key-value states (KV-Cache). Recent studies attempt to share KV-Cache across layers, but these approaches either require expensive pretraining or rely on per-token cross-layer cosine similarity that is often limited in practice. We show, via Centered Kernel Alignment (CKA), that the dominant singular vectors of KV-Cache are well aligned across layers. Motivated by this observation, we propose xKV, a post-training compression method that jointly factorizes grouped-layer KV-Cache into a shared low-rank subspace, substantially reducing KV-Cache memory. Across widely used LLMs, xKV achieves up to 8x KV-Cache compression while preserving accuracy on long-context tasks and in multi-turn settings. To further improve efficiency, we introduce Selective Reconstruction (SR) at decode time. Combined with SR, xKV achieves up to 4.23x end-to-end speedup over the full attention baseline, and surpasses notable baselines with 30% higher throughput under a similar accuracy level. Overall, xKV provides a plug-and-play approach to reduce both memory and latency for long-context LLM inference. Our code is publicly available at: https://github.com/abdelfattah-lab/xKV.

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Hung-Yueh Chiang, Yash Akhauri, Xilai Dai, Huiqiang Jiang, Yucheng Li, Luis Ceze, Kai-Chiang Wu, Mohamed S. Abdelfattah• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy61.9	1424
Long-context language modeling	LongBench	Average Score42.69	369
Multi-task Language Understanding	MMLU	Accuracy63.9	353
Document Question Answering	Qasper	Accuracy35.6	44
Long-context evaluation	RULER 64k	VT Score86.67	43
Key-Value Retrieval	LITM (Lost in the Middle)	Accuracy99.9	33
Variable Tracking	RULER-VT	Accuracy99.8	33
Long-context Language Understanding	RULER 64k context length	FWE (Error)78.47	22
Long-context evaluation	LongBench (test)	NarQA Score32.85	18
Long-context Language Understanding	LongBench 1 host v1 (test)	2WQA Score39.53	14

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord