| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Maze Navigation | Maze (test) | Success Rate0.8 | 25 | |
| Maze Navigation | Maze Hard | Accuracy97.66 | 14 | |
| Maze Navigation | Maze (Standard) | Accuracy0.9961 | 14 | |
| Sequential Planning | Maze | Score (L=8)100 | 12 | |
| Planning | 10x10 Maze | Validity Rate57 | 12 | |
| Planning | Maze | Success Rate0.63 | 10 | |
| Goal-reaching | maze_large (test) | Success Rate70.5 | 10 | |
| Reinforcement Learning | Maze 17^10 structured discrete | Mean Score9.61 | 9 | |
| Visual Planning | MAZE | EM74.5 | 8 | |
| Multimodal Maze Solving | MAZE | Pass@1 Accuracy84 | 8 | |
| Reinforcement Learning | Maze 5^4 unstructured discrete | Mean Performance9.57 | 7 | |
| Reinforcement Learning | Maze 5^4 structured discrete | Mean Score9.74 | 7 | |
| Problem Solving and Unsolvability Detection | Maze Hard | Solvable Accuracy98 | 7 | |
| Problem Solving and Unsolvability Detection | Maze Easy | Accuracy (Solvable)100 | 7 | |
| Hierarchical Planning | Maze | Token Cost3,518 | 6 | |
| Multi-Agent Task Scheduling | Maze |A| = 400 (test) | Throughput2,417 | 4 | |
| Reinforcement Learning | Maze 17^10 unstructured discrete | Mean Score9.25 | 4 | |
| POMDP Planning | maze-10 POMDP PRISM format (original enlarged) | Value (IQM)8.86 | 4 | |
| Spatial Reasoning | MAZE | Pass@185.5 | 4 | |
| Extrapolation | Maze (24, 124) (test) | Accuracy100 | 4 | |
| Long-horizon prediction | Medium Maze | NLL-0.88 | 4 | |
| Maze Path Planning | Maze 48x48 | Validity89 | 3 | |
| Maze Path Planning | Maze 32x32 | Validity88.6 | 3 | |
| Maze Path Planning | Maze 16x16 | Validity88.6 | 3 | |
| Maze Path Planning | Maze 8x8 | Validity94 | 3 |