Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference

About

Large language models are increasingly applied to multi-document and long-form inputs, yet long-context inference remains memory- and noise-inefficient. Key-value (KV) caching scales linearly with context length, while external retrieval methods often return lexically similar but causally irrelevant passages. We present S3-Attention, a memory-first inference-time framework that treats long-context processing as attention-aligned endogenous retrieval. S3-Attention decodes transient key and query projections into top-k sparse feature identifiers using lightweight sparse autoencoders, and constructs a CPU-based inverted index mapping features to token positions or spans during a single streaming scan. This design allows the KV cache to be discarded entirely and bounds GPU memory usage by the scan chunk size. At generation time, feature co-activation is used to retrieve compact evidence spans, optionally fused with BM25 for exact lexical matching. Under a unified LongBench evaluation protocol with fixed prompting, decoding, and matched token budgets, S3-Hybrid closely matches full-context inference across multiple model families and improves robustness in several information-dense settings. We also report an engineering limitation of the current prototype, which incurs higher wall-clock latency than optimized full-KV baselines, motivating future kernel-level optimization.

Qingsen Ma, Dianyun Wang, Yaoye Wang, Lechen Ning, Sujie Zhu, Xiaohang Zhang, Jiaming Lyu, Linhao Ren, Zhenbo Xu, Zhaofeng He• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score47.14
221
Long-context Language UnderstandingLongBench
M-Avg24.87
219
Multi-hop Question AnsweringMuSiQue--
106
Question AnsweringNarrativeQA
F1 Score11.3
87
Question AnsweringQasper
F1 Score21.87
61
Multi-hop Question Answering2WikiMHQA
F1 Score17.56
55
Summarizationnews multi
Rouge-L23.66
21
Question Answeringen multifield
F1 Score43.54
21
Summarizationreport gov
ROUGE-L19.55
21
Showing 9 of 9 rows

Other info

Follow for update