Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Condensate Theorem: Transformers are O(n), Not $O(n^2)$

About

We present the Condensate Theorem: attention sparsity is a learned topological property, not an architectural constraint. Through empirical analysis of trained language models, we find that attention mass concentrates on a distinct topological manifold -- and this manifold can be identified dynamically without checking every position. We prove a general result: for any query, projecting attention onto the Condensate Manifold (Anchor + Window + Dynamic Top-k) achieves 100% output equivalence with full $O(n^2)$ attention. This is not an approximation -- it is lossless parity. We validate this across GPT-2, Pythia, Qwen2, TinyLlama, and Mistral, demonstrating bit-exact token matching on 1,500+ generated tokens. By mapping this topology to hardware, our Topological Attention kernel achieves a 159x measured speedup at 131K tokens (3.94ms vs 628ms) and a projected >1,200x speedup at 1M tokens, reducing inference costs by >99.9% compared to Flash Attention. We conclude that the quadratic bottleneck is an artifact of naive implementation, not intelligence.

Jorge L. Ruiz Williams• 2026

Related benchmarks

TaskDatasetResultRank
Attention Mechanism Latency BenchmarkSynthetic sequences
Latency (ms)0.03
16
Latency MeasurementSynthetic sequences performance benchmarking
Latency (ms)0.03
11
Needle RetrievalNeedle Retrieval
Time (ms)6.5
9
Showing 3 of 3 rows

Other info

Follow for update