Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive Mass-Segmented KV Compression for Long-Context Reasoning

About

The linear growth of the Key-Value (KV) cache is a critical bottleneck in long-form LLM inference. Existing KV compression methods mitigate this by evicting tokens based on importance scores. However, we show that their reliance on global Top-k selection triggers Region Wipe-out: the severe eviction of contiguous reasoning blocks that derails logical coherence. To address this, we propose Adaptive Mass-Segmented (AMS) KV Compression, a framework that shifts the paradigm from token-level competition to region-aware quota allocation. AMS adaptively partitions the KV cache based on the spatial distribution of attention mass, ensuring structurally vital reasoning segments receive guaranteed memory quotas. To ensure stability during iterative decoding, an EMA-based smoothing mechanism is incorporated to prevent jitter in segment boundaries. Crucially, AMS is a universal plug-and-play layer that is orthogonal to existing scorers. It can be seamlessly integrated into representative methods such as TOVA, Expected Attention, KeyDiff, R-KV and TriAttention. AMS is also system-compatible with modern paged-KV serving frameworks such as vLLM, supporting efficient gather-and-compact KV execution without introducing additional steady-state attention overhead. Extensive experiments across a diverse suite of tasks, including mathematical reasoning (MATH500, AIME, GSM8K), code completion, open-domain QA, and sparse retrieval, demonstrate that AMS consistently mitigates structural fragmentation and boosts model performance.

Junzhe Yang, Xiaoyu Shen• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K--
204
Mathematical ReasoningAIME 25
Pass@1 Accuracy33.33
178
Code CompletionRepoBench-P
Similarity0.2744
17
Needle-In-A-Haystack RetrievalNIAH
NIAH Score23.84
14
Long-context language modelingLongBench
LongBench Average Score19.3
12
Question AnsweringTriviaQA
TriviaQA Accuracy70.88
7
Showing 6 of 6 rows

Other info

Follow for update