| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Planning | Sudoku | Accuracy92.7 | 68 | |
| Logical Reasoning | Sudoku | Accuracy89.2 | 44 | |
| Sudoku | Sudoku 4x4 (test) | Accuracy (Seq Len 64)17.5 | 18 | |
| Agent Task | Sudoku | Success Rate (SR)99 | 17 | |
| Agent Behavior Adaptation | Sudoku (Su) (test) | Loop Ratio34.3 | 17 | |
| Sudoku Solving | Sudoku 512 tokens | Pass@180.3 | 15 | |
| Sudoku Solving | Sudoku 256 tokens | Pass@192.9 | 15 | |
| Sudoku Solving | Sudoku 2x2 | Final Reward1.3 | 14 | |
| Sudoku Solving | Sudoku (test) | Accuracy76.4 | 12 | |
| Constraint Satisfaction | Sudoku | CSP Result Index 3557 | 12 | |
| Sudoku Solving | Sudoku | Success Rate (pass@1)100 | 10 | |
| Sequential puzzle-solving | Sudoku | Accuracy44.2 | 9 | |
| Sudoku Solving | Sudoku | Average per-epoch runtime (ms)12.15 | 9 | |
| Reasoning | Sudoku (test) | Accuracy0.161 | 9 | |
| Sudoku Solving | 9x9 Sudoku (test) | Cell Accuracy52 | 7 | |
| Sudoku Solving | Sudoku 5x5 | Final Reward2.7 | 7 | |
| Sudoku Solving | Sudoku 4x4 | Final Reward2.1 | 7 | |
| Sudoku Solving | Sudoku 3x3 | Final Reward160 | 7 | |
| Sudoku Solving | Sudoku (17-givens) | Accuracy96.6 | 7 | |
| Symbolic planning | 4x4 Sudoku | Accuracy (Ngen=128)26.6 | 6 | |
| Reasoning | Sudoku | Avg Diffusion Steps38.3 | 6 | |
| Logical Reasoning | Sudoku 9x9 | Accuracy0.11 | 5 | |
| Sudoku Solving | Sudoku 24-36-givens | Accuracy70 | 1 |