Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers

About

A human-like chess engine should mimic the style, errors, and consistency of a strong human player rather than maximize playing strength. We show that training from move sequences alone forces a model to learn two capabilities: state tracking, which reconstructs the board from move history, and decision quality, which selects good moves from that reconstructed state. These impose contradictory data requirements: low-rated games provide the diversity needed for tracking, while high-rated games provide the quality signal for decision learning. Removing low-rated data degrades performance. We formalize this tension as a dual-capability bottleneck, P <= min(T,Q), where overall performance is limited by the weaker capability. Guided by this view, we scale the model from 28M to 120M parameters to improve tracking, then introduce Elo-weighted training to improve decisions while preserving diversity. A 2 x 2 factorial ablation shows that scaling improves tracking, weighting improves decisions, and their combination is superadditive. Linear weighting works best, while overly aggressive weighting harms tracking despite lower validation loss. We also introduce a coverage-decay formula, t* = log(N/kcrit)/log b, as a reliability horizon for intra-game degeneration risk. Our final 120M-parameter model, without search, reached Lichess bullet 2570 over 253 rated games. On human move prediction it achieves 55.2% Top-1 accuracy, exceeding Maia-2 rapid and Maia-2 blitz. Unlike position-based methods, sequence input naturally encodes full game history, enabling history-dependent decisions that single-position models cannot exhibit.

Quanhao Li, Wei Jiang• 2026

Related benchmarks

Task	Dataset	Result
Human move prediction	12,000 balanced bullet decision points Overall	Top-1 Accuracy55.2	3
Human move prediction	12,000 balanced bullet decision points Elo 2100-2300	Top-1 Accuracy54.3	3
Human move prediction	12,000 balanced bullet decision points Elo 2300-2500	Top-1 Accuracy53.7	3
Human move prediction	12,000 balanced bullet decision points (Elo 2500-2700)	Top-1 Accuracy56.2	3
Human move prediction	bullet decision points 12,000 balanced (Elo 2700+)	Top-1 Accuracy0.568	3
Human move prediction	12,000 Balanced Bullet Decision Points Opening Phase	Top-1 Accuracy55.2	3
Human move prediction	12,000 Balanced Bullet Decision Points Middlegame Phase	Top-1 Accuracy55.2	3
Human move prediction	12,000 balanced bullet decision points (Endgame phase)	Top-1 Accuracy55.3	3
Human-blunder alignment	rated bullet games	P(Model Blunder \| Human Blunder)49.2	3
Chess Gameplay	Lichess	Bullet Elo Rating (Lichess)2.57e+3	1

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord