Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

About

Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, full parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.

Jingdi Lei, Di Zhang, Soujanya Poria• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy44.5
1460
Commonsense ReasoningWinoGrande
Accuracy52.1
776
Commonsense ReasoningPIQA
Accuracy68.9
647
Language ModelingWikiText
PPL18.3
479
Question AnsweringBoolQ--
240
Question AnsweringSciQ
Accuracy84.2
226
Language ModelingLAMBADA
Accuracy43.2
183
Commonsense ReasoningARC Challenge
Accuracy26.4
132
Common Sense ReasoningARC Easy
ARC (easy) Accuracy54.4
52
Question AnsweringOpenBookQA
Normalized Accuracy31.6
35
Showing 10 of 11 rows

Other info

GitHub

Follow for update