KV Cache Transform Coding for Compact Storage in LLM Inference
About
Serving large language models (LLMs) at scale necessitates efficient key-value (KV) cache management. KV caches can be reused across conversation turns via shared-prefix prompts that are common in iterative code editing and chat. However, stale caches consume scarce GPU memory, require offloading, or force recomputation. We present KVTC, a lightweight transform coder that compresses KV caches for compact on-GPU and off-GPU storage. Drawing on classical media compression, KVTC combines PCA-based feature decorrelation, adaptive quantization, and entropy coding. It requires only a brief initial calibration and leaves model parameters unchanged. By exploiting redundancies in KV caches, KVTC achieves up to 20$\times$ compression while maintaining reasoning and long-context accuracy, and 40$\times$ or higher for specific use cases. We test KVTC with Llama 3, Mistral NeMo, and R1-Qwen 2.5 models across benchmarks including AIME25, GSM8K, LiveCodeBench, LongBench, MATH-500, MMLU, Qasper and RULER. It consistently outperforms inference-time baselines such as token eviction, quantization, and SVD-based methods, while achieving higher compression ratios. These results support KVTC as a practical building block for memory-efficient LLM serving with reusable KV caches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy62.5 | 1362 | |
| Multi-task Language Understanding | MMLU | Accuracy64.6 | 321 | |
| Mathematical Reasoning | MATH 500 | pass@174.41 | 239 | |
| Mathematics | AIME 2024 | Accuracy52.5 | 60 | |
| Document Question Answering | Qasper | Accuracy40.7 | 44 | |
| Key-Value Retrieval | LITM (Lost in the Middle) | Accuracy99.9 | 33 | |
| Variable Tracking | RULER-VT | Accuracy99.5 | 33 | |
| Long-context Language Understanding | LongBench 1 host v1 (test) | 2WQA Score46.23 | 14 | |
| Coding | LiveCodeBench | Accuracy36.5 | 8 | |
| Long-context Language Understanding | RULER 0 shot v1 (test) | CWE Score92.41 | 7 |