Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

About

Linear attention reduces the quadratic cost of softmax attention by maintaining a recurrent fast-weight state, but it consistently lags on in-context retrieval and long-context tasks. Existing remedies act on the write side of memory through gating, delta updates, or kernel feature maps, but the read step is left unchanged: every past key contributes additively to the output, so useful targets are diluted by the bulk of stored vectors. We borrow one specific piece of softmax's geometry to construct a cheap read-time contraction of the query. A second-order Taylor expansion of the softmax log-partition at the isotropic-attention point gives a local quadratic model whose curvature coincides with the running key covariance, a quantity that can be maintained with the same recurrent/chunkwise mechanism as the linear-attention state. The associated linear operator contracts the query along the high-density directions of memory before it reads the state. We call this mechanism Curvature-Conditioned Query (CCQ). CCQ modifies only the read step and is composable with any linear-attention backbone. Attached to GLA and Gated DeltaNet, it improves perplexity, zero-shot downstream accuracy, S-NIAH retrieval at and beyond the training context, length-extrapolation perplexity from 4K to 20K, and LongBench accuracy, at small extra cost.

Dong Le, Thong Nguyen, Cong-Duy Nguyen, Anh Tuan Luu• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1442
Language ModelingWikiText
Word Perplexity16.92
234
Commonsense ReasoningPIQA
Accuracy72.03
213
Question AnsweringBoolQ
Accuracy60.37
201
Commonsense ReasoningSIQA
Accuracy41.71
168
Question AnsweringARC Challenge
Normalized Accuracy38.82
105
Question AnsweringARC Easy
Normalized Accuracy66.46
55
Common Sense ReasoningHellaSwag--
47
Long-context retrievalNIAH Single-3
Accuracy (1024)94.4
22
Common Sense ReasoningCommon-sense Reasoning Suite PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, SIQA, BoolQ
Average Accuracy56.24
14
Showing 10 of 14 rows

Other info

Follow for update