S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference

About

Large language models are increasingly applied to multi-document and long-form inputs, yet long-context inference remains memory- and noise-inefficient. Key-value (KV) caching scales linearly with context length, while external retrieval methods often return lexically similar but causally irrelevant passages. We present S3-Attention, a memory-first inference-time framework that treats long-context processing as attention-aligned endogenous retrieval. S3-Attention decodes transient key and query projections into top-k sparse feature identifiers using lightweight sparse autoencoders, and constructs a CPU-based inverted index mapping features to token positions or spans during a single streaming scan. This design allows the KV cache to be discarded entirely and bounds GPU memory usage by the scan chunk size. At generation time, feature co-activation is used to retrieve compact evidence spans, optionally fused with BM25 for exact lexical matching. Under a unified LongBench evaluation protocol with fixed prompting, decoding, and matched token budgets, S3-Hybrid closely matches full-context inference across multiple model families and improves robustness in several information-dense settings. We also report an engineering limitation of the current prototype, which incurs higher wall-clock latency than optimized full-KV baselines, motivating future kernel-level optimization.

Qingsen Ma, Dianyun Wang, Yaoye Wang, Lechen Ning, Sujie Zhu, Xiaohang Zhang, Jiaming Lyu, Linhao Ren, Zhenbo Xu, Zhaofeng He• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score47.14	294
Long-context Language Understanding	LongBench	M-Avg24.87	294
Multi-hop Question Answering	MuSiQue	--	209
Question Answering	NarrativeQA	F1 Score11.3	124
Multi-hop Question Answering	2WikiMHQA	F1 Score17.56	73
Question Answering	Qasper	F1 Score21.87	61
Summarization	news multi	Rouge-L23.66	21
Question Answering	en multifield	F1 Score43.54	21
Summarization	report gov	ROUGE-L19.55	21

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord