Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chess

Benchmarks

Task NameDataset NameSOTA ResultTrend
Classificationchess
F1 Score81.19
26
Safe Move PredictionChess (Exploratory Learning Phase)
Blunder Rate24.11
16
Classificationchess
Accuracy85.4
12
Tool SelectionChess Skill: beginner, intermediate, advanced
Accuracy100
10
Tool SelectionChess Specialists: opening, midgame, endgame, late-endgame
Accuracy64.4
10
Chess Position EvaluationChess 1,002 disjoint positions
Mean Evaluation (cp)-243
5
ExtrapolationChess (8, 8) (test)
Accuracy84.3
4
Board state reconstructionChess
Coverage98
3
ChessChess Match vs Stockfish 100 games
Win Rate25
2
Instruction RecoveryChess O3
Extracted Instructions126,516,521
1
Instruction RecoveryChess O2
Extracted Instructions Count1,905,184
1
Instruction RecoveryChess O0
Extracted Instructions Count8,568,798
1
Showing 12 of 12 rows