Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Residual-Mass Accounting for Partial-KV Decoding

About

We study a controlled partial-KV decoding setting in which exact unnormalized softmax contributions are computed for sink/tail anchors and a retrieved token set, while the remaining prefill tokens are represented by a residual estimate. We focus on the accounting rule after the query-dependent exact support has been selected, and use exhaustive Top-K only as an oracle selector, not as a deployable retrieval system. The proposed rule leaves the backbone language model and the exact-branch KV tensors unchanged. It builds fixed-size summary states $(S,u)$ from learned positive feature maps $\phi$, subtracts retrieved-token feature contributions to keep the exact and residual sets non-overlapping, and merges the estimated residual numerator and denominator with the exact branch under one normalization. At a 1% exact-support budget, our residual-completion method improves over the selection-only Top-K baseline on RULER and BABILong across frozen 1B and 3B Llama-3.2-Instruct backbones at all reported context lengths. In the 0.5-4% exact-support budget sweeps, this trend largely persists. On LongBench, summarization results are mostly favorable, while multi-document QA is mixed. Attention-output diagnostics support retrieved-token subtraction as the partition-consistent accounting rule, while indicating that the main remaining error is imperfect learned-$\phi$ approximation of the unretrieved residual mass.

Yasuto Hoshi, Daisuke Miyashita, Jun Deguchi• 2026

Related benchmarks

TaskDatasetResultRank
Long-context language modelingRULER--
75
Long-context language modelingRULER 16k context
Accuracy (RULER 16K)83
72
Long-context ReasoningBABILong 16k
Accuracy28.3
72
Long-context language modeling evaluationRULER Context Length = 8K
Average Accuracy (RULER 8K)84.8
72
Long-context ReasoningBABILong 8K
Accuracy33
65
Long-context ReasoningBABILong 4K
Accuracy (BABILong 4k)34.3
51
Showing 6 of 6 rows

Other info

Follow for update