Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tracking vs. Deciding: The Dual-Capability Bottleneck in Searchless Chess Transformers

About

A human-like chess engine should mimic the style, errors, and consistency of a strong human player rather than maximize playing strength. We show that training from move sequences alone forces a model to learn two capabilities: state tracking, which reconstructs the board from move history, and decision quality, which selects good moves from that reconstructed state. These impose contradictory data requirements: low-rated games provide the diversity needed for tracking, while high-rated games provide the quality signal for decision learning. Removing low-rated data degrades performance. We formalize this tension as a dual-capability bottleneck, P <= min(T,Q), where overall performance is limited by the weaker capability. Guided by this view, we scale the model from 28M to 120M parameters to improve tracking, then introduce Elo-weighted training to improve decisions while preserving diversity. A 2 x 2 factorial ablation shows that scaling improves tracking, weighting improves decisions, and their combination is superadditive. Linear weighting works best, while overly aggressive weighting harms tracking despite lower validation loss. We also introduce a coverage-decay formula, t* = log(N/kcrit)/log b, as a reliability horizon for intra-game degeneration risk. Our final 120M-parameter model, without search, reached Lichess bullet 2570 over 253 rated games. On human move prediction it achieves 55.2% Top-1 accuracy, exceeding Maia-2 rapid and Maia-2 blitz. Unlike position-based methods, sequence input naturally encodes full game history, enabling history-dependent decisions that single-position models cannot exhibit.

Quanhao Li, Wei Jiang• 2026

Related benchmarks

TaskDatasetResultRank
Human move prediction12,000 balanced bullet decision points Overall
Top-1 Accuracy55.2
3
Human move prediction12,000 balanced bullet decision points Elo 2100-2300
Top-1 Accuracy54.3
3
Human move prediction12,000 balanced bullet decision points Elo 2300-2500
Top-1 Accuracy53.7
3
Human move prediction12,000 balanced bullet decision points (Elo 2500-2700)
Top-1 Accuracy56.2
3
Human move predictionbullet decision points 12,000 balanced (Elo 2700+)
Top-1 Accuracy0.568
3
Human move prediction12,000 Balanced Bullet Decision Points Opening Phase
Top-1 Accuracy55.2
3
Human move prediction12,000 Balanced Bullet Decision Points Middlegame Phase
Top-1 Accuracy55.2
3
Human move prediction12,000 balanced bullet decision points (Endgame phase)
Top-1 Accuracy55.3
3
Human-blunder alignmentrated bullet games
P(Model Blunder | Human Blunder)49.2
3
Chess GameplayLichess
Bullet Elo Rating (Lichess)2.57e+3
1
Showing 10 of 12 rows

Other info

Follow for update