Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation

About

Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce \textsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to $O(1)$ lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman $\bar\rho{=}0.90$), and direct model-to-model table transfer costs only $0.4$--$0.8$ LongBench points on average. At a 512-token budget, \textsc{CompilerKV} attains compressed-SOTA on all four backbones, improving over the strongest prefill-only baseline by $+1.67$ points on average (task-bootstrap 95\% CI $[+1.08,+2.37]$). Pressure regimes amplify the gap: under a fixed $512/32k$ cache ratio, CompilerKV remains the strongest compressed method through 128k RULER ($\sim\!73$ vs.\ FullKV $\sim\!79$, SnapKV $\sim\!38$); on 32k NIAH it reaches $0.89$ vs.\ SnapKV $0.42$; and at 32k input, retaining only $1.56\%$ of the prefill KV, batch-16 serving remains feasible where FullKV is OOM.

Ning Yang, Chengzhi Wang, Yibo Liu, Baoliang Tian, Haijun Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Long-context evaluationLongBench
Average Score37.97
90
Long-context UnderstandingLongBench 1.0 (test)
NarrativeQA26.06
84
Single-Doc Question AnsweringLongBench
MultifieldQA Score43.2
75
Few-shot LearningLongBench
TREC Score70.5
51
SummarizationLongBench
GovRep Score21.62
51
Multi-document Question AnsweringLongBench
HotpotQA Acc41.37
45
Code AnalysisLongBench
Lcc Score57.91
43
Synthetic TasksLongBench
PCount6.5
43
Showing 8 of 8 rows

Other info

Follow for update