Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

About

Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.

Benhao Huang, Zhengyang Geng, Zico Kolter• 2026

Related benchmarks

Task	Dataset	Result
Sudoku Solving	Sudoku-Extreme (test)	Accuracy99.8	31
Reasoning	Sudoku Extreme	Pass@1 Accuracy99.8	21
Reasoning	ARC Mini	Accuracy55.28	16
Puzzle Solving	Sudoku-Extreme (test)	Pass@1 Success Rate93	9
Maze	Maze-Unique (test)	Exact Accuracy93	7

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord