| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | Sudoku | Accuracy94.3 | 142 | |
| Planning | Sudoku | Accuracy92.7 | 129 | |
| Sudoku Solving | Sudoku (test) | Accuracy100 | 27 | |
| Reasoning | Sudoku | Pass@196 | 26 | |
| Reasoning | Sudoku | Accuracy (Sudoku Reasoning)91.9 | 25 | |
| Logical planning | Sudoku (test) | Accuracy91.7 | 24 | |
| Reasoning | Sudoku Extreme | Pass@1 Accuracy99.8 | 21 | |
| Puzzle Solving | Sudoku | Test Accuracy25.39 | 20 | |
| Sudoku | Sudoku | Accuracy91.82 | 19 | |
| Reasoning | Sudoku (test) | Accuracy100 | 19 | |
| Sudoku | Sudoku 4x4 (test) | Accuracy (Seq Len 64)17.5 | 18 | |
| Agent Task | Sudoku | Success Rate (SR)99 | 17 | |
| Agent Behavior Adaptation | Sudoku (Su) (test) | Loop Ratio34.3 | 17 | |
| Sudoku Solving | Sudoku 512 tokens | Pass@180.3 | 15 | |
| Sudoku Solving | Sudoku 256 tokens | Pass@192.9 | 15 | |
| Planning and Reasoning | Sudoku | Accuracy99.4 | 14 | |
| Sudoku Solving | Sudoku 2x2 | Final Reward1.3 | 14 | |
| Puzzle Solving | Sudoku Out Of Distribution | Average @12877.8 | 12 | |
| Puzzle Solving | Sudoku In Distribution | Average Score @12897.1 | 12 | |
| Sudoku Solving | Sudoku 10k 9x9 boards (val) | Board Accuracy93.4 | 12 | |
| Constraint Satisfaction | Sudoku | CSP Result Index 3557 | 12 | |
| Sudoku Solving | Sudoku | Success Rate (pass@1)100 | 12 | |
| Sudoku puzzle solving | Sudoku Hard 30/81 digits visible (test) | Exact Match Accuracy58.4 | 10 | |
| Sudoku puzzle solving | Sudoku Med. 35/81 digits visible (test) | Exact Match Accuracy85.2 | 10 | |
| Sudoku puzzle solving | Sudoku Easy 40/81 digits visible (test) | Exact Match Accuracy96.3 | 10 |