Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

About

Linear attention reduces the quadratic cost of softmax attention by maintaining a recurrent fast-weight state, but it consistently lags on in-context retrieval and long-context tasks. Existing remedies act on the write side of memory through gating, delta updates, or kernel feature maps, but the read step is left unchanged: every past key contributes additively to the output, so useful targets are diluted by the bulk of stored vectors. We borrow one specific piece of softmax's geometry to construct a cheap read-time contraction of the query. A second-order Taylor expansion of the softmax log-partition at the isotropic-attention point gives a local quadratic model whose curvature coincides with the running key covariance, a quantity that can be maintained with the same recurrent/chunkwise mechanism as the linear-attention state. The associated linear operator contracts the query along the high-density directions of memory before it reads the state. We call this mechanism Curvature-Conditioned Query (CCQ). CCQ modifies only the read step and is composable with any linear-attention backbone. Attached to GLA and Gated DeltaNet, it improves perplexity, zero-shot downstream accuracy, S-NIAH retrieval at and beyond the training context, length-extrapolation perplexity from 4K to 20K, and LongBench accuracy, at small extra cost.

Dong Le, Thong Nguyen, Cong-Duy Nguyen, Anh Tuan Luu• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	--	1581
Commonsense Reasoning	PIQA	Accuracy72.03	400
Language Modeling	WikiText	Word Perplexity16.92	331
Question Answering	BoolQ	Accuracy60.37	233
Commonsense Reasoning	SIQA	Accuracy41.71	183
Question Answering	ARC Challenge	Normalized Accuracy38.82	105
Question Answering	ARC Easy	Normalized Accuracy66.46	55
Common Sense Reasoning	HellaSwag	--	47
Long-context retrieval	NIAH Single-3	Accuracy (1024)94.4	22
Common Sense Reasoning	Common-sense Reasoning Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, SIQA, BoolQ	Average Accuracy56.24	14

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord