Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

xKV: Cross-Layer KV-Cache Compression via Aligned Singular Vector Extraction

About

Long-context Large Language Models (LLMs) enable powerful applications but incur high memory costs due to the key-value states (KV-Cache). Recent studies attempt to share KV-Cache across layers, but these approaches either require expensive pretraining or rely on per-token cross-layer cosine similarity that is often limited in practice. We show, via Centered Kernel Alignment (CKA), that the dominant singular vectors of KV-Cache are well aligned across layers. Motivated by this observation, we propose xKV, a post-training compression method that jointly factorizes grouped-layer KV-Cache into a shared low-rank subspace, substantially reducing KV-Cache memory. Across widely used LLMs, xKV achieves up to 8x KV-Cache compression while preserving accuracy on long-context tasks and in multi-turn settings. To further improve efficiency, we introduce Selective Reconstruction (SR) at decode time. Combined with SR, xKV achieves up to 4.23x end-to-end speedup over the full attention baseline, and surpasses notable baselines with 30% higher throughput under a similar accuracy level. Overall, xKV provides a plug-and-play approach to reduce both memory and latency for long-context LLM inference. Our code is publicly available at: https://github.com/abdelfattah-lab/xKV.

Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Hung-Yueh Chiang, Yash Akhauri, Xilai Dai, Huiqiang Jiang, Yucheng Li, Luis Ceze, Kai-Chiang Wu, Mohamed S. Abdelfattah• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy61.9
1398
Multi-task Language UnderstandingMMLU
Accuracy63.9
353
Long-context language modelingLongBench
Average Score42.69
328
Document Question AnsweringQasper
Accuracy35.6
44
Long-context evaluationRULER 64k
VT Score86.67
43
Key-Value RetrievalLITM (Lost in the Middle)
Accuracy99.9
33
Variable TrackingRULER-VT
Accuracy99.8
33
Long-context Language UnderstandingRULER 64k context length
FWE (Error)78.47
22
Long-context evaluationLongBench (test)
NarQA Score32.85
18
Long-context Language UnderstandingLongBench 1 host v1 (test)
2WQA Score39.53
14
Showing 10 of 12 rows

Other info

Follow for update