Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sudoku

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningSudoku
Accuracy94.3
119
PlanningSudoku
Accuracy92.7
76
ReasoningSudoku
Pass@196
26
SudokuSudoku 4x4 (test)
Accuracy (Seq Len 64)17.5
18
Agent TaskSudoku
Success Rate (SR)99
17
Agent Behavior AdaptationSudoku (Su) (test)
Loop Ratio34.3
17
Sudoku SolvingSudoku 512 tokens
Pass@180.3
15
Sudoku SolvingSudoku 256 tokens
Pass@192.9
15
Sudoku SolvingSudoku 2x2
Final Reward1.3
14
Puzzle SolvingSudoku Out Of Distribution
Average @12877.8
12
Puzzle SolvingSudoku In Distribution
Average Score @12897.1
12
Sudoku SolvingSudoku 10k 9x9 boards (val)
Board Accuracy93.4
12
Sudoku SolvingSudoku (test)
Accuracy76.4
12
Constraint SatisfactionSudoku
CSP Result Index 3557
12
Visual ReasoningSudoku
Accuracy (Scale 40)100
10
Sudoku SolvingSudoku
Success Rate (pass@1)100
10
Logic ReasoningSudoku 8B Instruct (test)
Accuracy71.7
9
Combinatorial ReasoningSudoku 4x4
Accuracy100
9
Combinatorial ReasoningSudoku 3x3
Accuracy100
9
Sequential puzzle-solvingSudoku
Accuracy44.2
9
Sudoku SolvingSudoku
Average per-epoch runtime (ms)12.15
9
ReasoningSudoku (test)
Accuracy0.161
9
ReasoningSudoku
Accuracy (Sudoku Reasoning)17.2
8
Sudoku Solving9x9 Sudoku (test)
Cell Accuracy52
7
Sudoku SolvingSudoku 5x5
Final Reward2.7
7
Showing 25 of 38 rows