Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation

About

Large Language Models (LLMs) in long-context scenarios are severely constrained by the linear growth of Key-Value (KV) cache memory. Existing KV compression methods rely either on static thresholds and attention-only heuristics or on coarse memory budget allocation. Under tight memory budgets, these methods overlook two key factors: prompt-dependent variation in compression risk and functional heterogeneity across attention heads, which destabilize token selection and lead to tail failures. To address these challenges, we propose CompilerKV, a risk-adaptive and head-aware compression framework that compiles offline experience into reusable decision tables for prefill-only deployment. CompilerKV integrates two key synergistic components: (i) a Head Heterogeneity Table, learned via offline contextual bandits, which assigns head-specific reliability weights to govern functional differences across attention heads explicitly; and (ii) a Risk-Adaptive Threshold Gating mechanism that jointly models attention entropy and local perplexity, transforming prompt-level risk into deployable retention thresholds. Experiments on LongBench show CompilerKV dominates SOTA methods under a 512-token budget, recovering 97.7\% of FullKV performance while achieving up to +5.2 points gain over the strongest competitor.

Ning Yang, Chengzhi Wang, Yibo Liu, Baoliang Tian, Haijun Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Single-Doc Question AnsweringLongBench
MultifieldQA Score43.2
36
Long-context UnderstandingLongBench 1.0 (test)
NarrativeQA26.06
32
Few-shot LearningLongBench
TREC Score70.5
12
SummarizationLongBench
GovRep Score21.62
12
Multi-document Question AnsweringLongBench
HotpotQA Acc41.37
6
Code AnalysisLongBench
Lcc Score57.91
4
Long-context evaluationLongBench
Average Score37.97
4
Synthetic TasksLongBench
PCount6.5
4
Showing 8 of 8 rows

Other info

Follow for update