| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Maze Navigation | Maze (test) | Success Rate0.8 | 25 | |
| Multi-Agent Path Finding | Medium Maze 25x25 world size, 32.8% static obstacle rate | Success Rate100 | 20 | |
| Maze Navigation | Maze Hard | Accuracy97.66 | 18 | |
| One-step next-observation prediction | Maze (test) | Token F198 | 16 | |
| Reasoning | Maze Hard | pass@1 Accuracy93.7 | 15 | |
| Maze Navigation | Maze (Standard) | Accuracy0.9961 | 14 | |
| Sequential Planning | Maze | Score (L=8)100 | 12 | |
| Planning | 10x10 Maze | Validity Rate57 | 12 | |
| Video Reasoning | Maze (test) | Precision82.1 | 11 | |
| Multi-Objective Reinforcement Learning | Maze | Mean Episode Reward (MER)223.55 | 11 | |
| Spatial Reasoning | Maze 10×10 | CR (%)61.37 | 10 | |
| Logical Reasoning | Maze | Pass@198 | 10 | |
| Video Generation | Maze | Maze Flow (Base)96.5 | 10 | |
| Visual Reasoning | Maze | Accuracy (Scale 8)100 | 10 | |
| Planning | Maze | Success Rate0.63 | 10 | |
| Goal-reaching | maze_large (test) | Success Rate70.5 | 10 | |
| Pathfinding | Maze hard (test) | Accuracy85.3 | 9 | |
| Maze solving | Maze (test) | Accuracy99.9 | 9 | |
| Reinforcement Learning | Maze 17^10 structured discrete | Mean Score9.61 | 9 | |
| Maze navigation | Maze 100 held-out mazes | Best Success Rate @ 352.6 | 8 | |
| Maze Solving | Maze Hard | RSR71.8 | 8 | |
| Reinforcement Learning | Maze | Mean Reward1.03 | 8 | |
| Convex Free-Space Approximation | Maze Line Seed | Time (ms)1.48 | 8 | |
| Convex Free-Space Approximation | Maze Point Seed | Time (ms)1.32 | 8 | |
| Visual Planning | MAZE | EM74.5 | 8 |