| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| VR-Bench | ChEaP | Success Rate (pass@2, Easy)72 | 10 | 2mo ago | |
| Frozen Lake | EPBS | Success Rate (pass@2, 4x4)98.7 | 10 | 2mo ago | |
| Maze (test) | Sheaf-ADMM | Accuracy99.9 | 9 | 2d ago | |
| PerfectMaze XL (held-out) | ADD | Solved Rate14 | 9 | 2mo ago | |
| PerfectMaze Large (held-out) | TRACED | Solved Rate27 | 9 | 2mo ago | |
| Maze Hard | GPT-4 + CoT | RSR71.8 | 8 | 19d ago | |
| Mazes-25 | LocRNN (ACT) | Accuracy49.99 | 7 | 3mo ago | |
| Mazes-19 | LocRNN (ACT) | Accuracy (Mazes-19)86.83 | 7 | 3mo ago | |
| Maze 4× OOD | Sheaf-ADMM | Exact Solve Rate4.5 | 5 | 2d ago | |
| Maze (2× OOD) | Sheaf-ADMM | Exact Solve Rate98.1 | 5 | 2d ago | |
| Searchformer maze (test) | Plan Accuracy6 | 4 | 7d ago | ||
| Drunkard maze (test) | Plan Accuracy61.5 | 4 | 7d ago | ||
| DFS maze (test) | Plan Accuracy99.3 | 4 | 7d ago | ||
| Kruskal maze (test) | Plan Accuracy99.9 | 4 | 7d ago | ||
| Wilson maze (test) | Plan Accuracy100 | 4 | 7d ago | ||
| Maze | McVAMP | Mean Path Length0.032 | 3 | 1mo ago | |
| Maze-hard (test) | SE-RRM | FSR88.8 | 3 | 3mo ago | |
| Maze | CMM | Accuracy82.2 | 1 | 2mo ago | |
| Mazes 25x25 (test) | - | Accuracy- | 0 | 3mo ago | |
| Mazes 19x19 (test) | - | Accuracy- | 0 | 3mo ago |