The Condensate Theorem: Transformers are O(n), Not $O(n^2)$

About

We present the Condensate Theorem: attention sparsity is a learned topological property, not an architectural constraint. Through empirical analysis of trained language models, we find that attention mass concentrates on a distinct topological manifold -- and this manifold can be identified dynamically without checking every position. We prove a general result: for any query, projecting attention onto the Condensate Manifold (Anchor + Window + Dynamic Top-k) achieves 100% output equivalence with full $O(n^2)$ attention. This is not an approximation -- it is lossless parity. We validate this across GPT-2, Pythia, Qwen2, TinyLlama, and Mistral, demonstrating bit-exact token matching on 1,500+ generated tokens. By mapping this topology to hardware, our Topological Attention kernel achieves a 159x measured speedup at 131K tokens (3.94ms vs 628ms) and a projected >1,200x speedup at 1M tokens, reducing inference costs by >99.9% compared to Flash Attention. We conclude that the quadratic bottleneck is an artifact of naive implementation, not intelligence.

Jorge L. Ruiz Williams• 2026

Related benchmarks

Task	Dataset	Result
Attention Mechanism Latency Benchmark	Synthetic sequences	Latency (ms)0.03	16
Latency Measurement	Synthetic sequences performance benchmarking	Latency (ms)0.03	11
Needle Retrieval	Needle Retrieval	Time (ms)6.5	9

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord