| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatial Navigation | 15x15 Grid World 50 environments 20 days | Median Path Cost (Mean)15.2 | 8 | |
| Darkroom | Grid World | Offline Training Time (hour)0.18 | 6 | |
| Goal-driven navigation | Grid-world Overall (unseen maps) | SR100 | 5 | |
| Goal-driven navigation | Grid-world Unseen Goals (unseen maps) | Success Rate100 | 5 | |
| Goal-driven navigation | Grid-world Seen Goals (unseen maps) | SR100 | 5 | |
| Large Dark Key-to-Door | Large Grid World | Offline Training Time (hour)3.16 | 3 | |
| Large Darkroom Dynamic | Large Grid World | Offline Training Time (hour)2.63 | 3 | |
| Large Darkroom Hard | Large Grid World | Offline Training Time (hour)2.78 | 3 | |
| Large Darkroom | Large Grid World | Offline Training Time (hour)2.38 | 3 | |
| Dark Key-to-Door | Grid World | Offline Training Time (hour)0.41 | 3 | |
| Darkroom Hard | Grid World | Offline Training Time (hour)0.2 | 3 | |
| Safety-constrained Reinforcement Learning | Grid-world Time-Variant Safety Threshold (100 randomly generated environments) | Safety Violations0 | 2 | |
| Safety-constrained Reinforcement Learning | Grid-world Time-Invariant Safety Threshold (100 randomly generated environments) | Safety Violation Count0 | 2 | |
| Reinforcement Learning | Grid World Npick=5, Sparse (test) | Maximum Average Return0.7 | 2 | |
| Reinforcement Learning | Grid World Npick=3 Dense (test) | Max Average Return3 | 2 |