Transactional Attention: Semantic Sponsorship for KV-Cache Retention

About

At K=16 tokens (0.4% of a 4K context), every existing KV-cache compression method achieves 0% on credential retrieval. The failure mode is dormant tokens: credentials, API keys, and configuration values that receive near-zero attention but become essential at generation time. Because these tokens lack the statistical signals that eviction policies rely on, no method based on attention scores, reconstruction loss, or learned retention gates retains them. We introduce Transactional Attention (TA), a sponsorship mechanism in which structural anchor patterns (e.g., "key:", "password:") protect adjacent value-bearing tokens from eviction. TA achieves 100% credential retrieval at K=16 where six baselines (H2O, TOVA, SnapKV, StreamingLLM, PyramidKV, DynamicKV) achieve 0%, and sustains 100% accuracy across 200 function-calling trials. TA-Fast, an attention-free variant, reduces memory overhead by 52% and is compatible with SDPA and FlashAttention. TA is orthogonal to existing compression methods and adds less than 1% latency overhead.

Abhinaba Basu• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	Perplexity (PPL)27.2	2320
Needle-in-a-Haystack	Needle-in-a-haystack 4x original context	Accuracy100	35
Adversarial Stress Test	Hard Mode stress tests	Similar Anchors (Prod/Dev)35	3
Function Calling	Tool-call benchmark (n=200)	Accuracy (12-char)100	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord