Adaptive Mass-Segmented KV Compression for Long-Context Reasoning

About

The linear growth of the Key-Value (KV) cache is a critical bottleneck in long-form LLM inference. Existing KV compression methods mitigate this by evicting tokens based on importance scores. However, we show that their reliance on global Top-k selection triggers Region Wipe-out: the severe eviction of contiguous reasoning blocks that derails logical coherence. To address this, we propose Adaptive Mass-Segmented (AMS) KV Compression, a framework that shifts the paradigm from token-level competition to region-aware quota allocation. AMS adaptively partitions the KV cache based on the spatial distribution of attention mass, ensuring structurally vital reasoning segments receive guaranteed memory quotas. To ensure stability during iterative decoding, an EMA-based smoothing mechanism is incorporated to prevent jitter in segment boundaries. Crucially, AMS is a universal plug-and-play layer that is orthogonal to existing scorers. It can be seamlessly integrated into representative methods such as TOVA, Expected Attention, KeyDiff, R-KV and TriAttention. AMS is also system-compatible with modern paged-KV serving frameworks such as vLLM, supporting efficient gather-and-compact KV execution without introducing additional steady-state attention overhead. Extensive experiments across a diverse suite of tasks, including mathematical reasoning (MATH500, AIME, GSM8K), code completion, open-domain QA, and sparse retrieval, demonstrate that AMS consistently mitigates structural fragmentation and boosts model performance.

Junzhe Yang, Xiaoyu Shen• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	--	220
Mathematical Reasoning	AIME 25	Pass@1 Accuracy33.33	190
Code Completion	RepoBench-P	Similarity0.2744	17
Needle-In-A-Haystack Retrieval	NIAH	NIAH Score23.84	14
Long-context language modeling	LongBench	LongBench Average Score19.3	12
Question Answering	TriviaQA	TriviaQA Accuracy70.88	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord