| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Offline Reinforcement Learning | Maze2D medium | Normalized Return179.2 | 38 | |
| Offline Reinforcement Learning | Maze2D umaze | Normalized Return141 | 38 | |
| Offline Reinforcement Learning | Maze2D large | Normalized Return96.8 | 33 | |
| State Exploration | Maze2D Square-b | State Coverage Ratio85 | 22 | |
| Robotic Path Planning | Maze2D (test) | BS1-1 | 22 | |
| Offline Reinforcement Learning | Maze2D large v1 | Normalized Return37.7 | 18 | |
| Offline Reinforcement Learning | Maze2D medium v1 | Normalized Return49.3 | 18 | |
| Offline Reinforcement Learning | Maze2D umaze v1 | Normalized Return52.2 | 18 | |
| Reward Conditioning (RC) | Maze2D (test) | Reward2.74 | 16 | |
| Behavior Cloning (BC) | Maze2D (test) | Reward2.74 | 16 | |
| State Exploration | Maze2D Square-tree | State Coverage Ratio50 | 11 | |
| State Exploration | Maze2D Corridor2 | State Coverage Ratio93 | 11 | |
| State Exploration | Maze2D Square-d | State Coverage Ratio0.77 | 11 | |
| State Exploration | Maze2D Square-c | State Coverage Ratio74 | 11 | |
| State Exploration | Maze2D Square-a | State Coverage Ratio87 | 11 | |
| Long horizon planning | Maze2D U-Maze | Normalized Return185.3 | 10 | |
| Offline Reinforcement Learning | Maze2D large v0 (test) | Score187.8 | 10 | |
| Offline Reinforcement Learning | Maze2D medium v0 (test) | Score152.3 | 10 | |
| Offline Reinforcement Learning | Maze2D umaze v0 (test) | Overall Score111 | 10 | |
| Continuous Control | Maze2D large | Total Reward361 | 9 | |
| Continuous Control | Maze2D medium | Total Reward416.28 | 9 | |
| Continuous Control | Maze2D umaze | Total Reward182.1 | 9 | |
| Offline Planning | Maze2D Large single-task D4RL | Normalized Avg Return143.9 | 6 | |
| Offline Planning | Maze2D U-Maze single-task D4RL | Normalized Avg Return109.5 | 6 | |
| Transition Synthesis | Maze2D large | Marginal0.937 | 5 |