Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning

About

Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.

Benhao Huang, Zhengyang Geng, Zico Kolter• 2026

Related benchmarks

TaskDatasetResultRank
Sudoku SolvingSudoku-Extreme (test)
Accuracy99.8
31
ReasoningSudoku Extreme
Pass@1 Accuracy99.8
21
MazeMaze-Unique (test)
Exact Accuracy93
7
ReasoningARC Mini
Accuracy55.28
3
Showing 4 of 4 rows

Other info

GitHub

Follow for update