| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Agentic Reasoning | Sokoban | Success Rate85 | 27 | |
| Spatial planning | Sokoban | Success Rate79 | 19 | |
| Sokoban | Sokoban (Hard-2) | Accuracy87.89 | 14 | |
| Sokoban | Sokoban (Hard-1) | Accuracy91.41 | 14 | |
| Sokoban | Sokoban (Standard) | Accuracy98.83 | 14 | |
| Puzzle Solving | Sokoban (Out Of Distribution) | Avg@12866.9 | 12 | |
| Puzzle Solving | Sokoban (In Distribution) | Average Score @12889.7 | 12 | |
| Agentic Reasoning | Sokoban (test) | Success Rate38.3 | 12 | |
| Sokoban | Sokoban (test) | Success Rate90.6 | 12 | |
| Planning | Full Sokoban | Validity Rate46 | 12 | |
| Planning | Sokoban Grid | Validity Rate63 | 12 | |
| Planning | Sokoban unseen problems | Completion Rate100 | 11 | |
| Planning | Sokoban known optimal problems | Optimal Rate1 | 11 | |
| Video Reasoning | Sokoban (test) | Precision34 | 11 | |
| Interactive Agent | Sokoban | Pass@164.6 | 10 | |
| Sokoban Puzzle | Sokoban Symbol variant | Box Placement Score2 | 10 | |
| Sokoban Puzzle | Sokoban Action variant | Box Placement Score1.83 | 10 | |
| Sokoban Puzzle | Sokoban Base | Box Placement Score1.89 | 10 | |
| Multi-turn RL navigation | Sokoban held-out (val) | Success Rate52.3 | 10 | |
| Planning | Sokoban | Completion Rate100 | 9 | |
| Puzzle Solving | Sokoban | Success Rate43.6 | 8 | |
| Single-Agent Spatial Puzzles | Sokoban (In-domain) | Success Rate77.3 | 8 | |
| Reinforcement Learning | Sokoban | Reward0.87 | 8 | |
| Planning | Sokoban | p@142.4 | 8 | |
| Sokoban | Sokoban 20 x 20, 4 boxes (test) | Success Rate77 | 8 |