| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Maze Navigation | Maze (test) | Success Rate0.8 | 25 | |
| Maze Navigation | Maze Hard | Accuracy97.66 | 14 | |
| Maze Navigation | Maze (Standard) | Accuracy0.9961 | 14 | |
| Sequential Planning | Maze | Score (L=8)100 | 12 | |
| Planning | 10x10 Maze | Validity Rate57 | 12 | |
| Multi-Objective Reinforcement Learning | Maze | Mean Episode Reward (MER)223.55 | 11 | |
| Video Generation | Maze | Maze Flow (Base)96.5 | 10 | |
| Visual Reasoning | Maze | Accuracy (Scale 8)100 | 10 | |
| Planning | Maze | Success Rate0.63 | 10 | |
| Goal-reaching | maze_large (test) | Success Rate70.5 | 10 | |
| Reinforcement Learning | Maze 17^10 structured discrete | Mean Score9.61 | 9 | |
| Visual Planning | MAZE | EM74.5 | 8 | |
| Multimodal Maze Solving | MAZE | Pass@1 Accuracy84 | 8 | |
| Multi-agent coordination | Maze Structure Map | Final Cumulative Win Rate (FW)81.15 | 7 | |
| Reinforcement Learning | Maze 5^4 unstructured discrete | Mean Performance9.57 | 7 | |
| Reinforcement Learning | Maze 5^4 structured discrete | Mean Score9.74 | 7 | |
| Problem Solving and Unsolvability Detection | Maze Hard | Solvable Accuracy98 | 7 | |
| Problem Solving and Unsolvability Detection | Maze Easy | Accuracy (Solvable)100 | 7 | |
| Reasoning | Maze Hard | pass@1 Accuracy93.7 | 6 | |
| Multi-Agent Path Finding | maze-32-32-4 (# agents: 30) | UA Conflicts4.75 | 6 | |
| Multi-Agent Path Finding | maze 32-32-2 (# agents: 30) | UA Conflicts7.59 | 6 | |
| Hierarchical Planning | Maze | Token Cost3,518 | 6 | |
| Maze solving | Maze (test) | Accuracy85.3 | 4 | |
| Safe Navigation | Maze 2 | Success Rate (SR)100 | 4 | |
| Safe Navigation | Maze 1 | Success Rate (SR)100 | 4 |