| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Logical Reasoning | Stepgame k=10 | Accuracy88.1 | 56 | |
| Logical Reasoning | Stepgame k=4 | Accuracy93.8 | 56 | |
| Logical Reasoning | Stepgame k=3 | Accuracy89.5 | 56 | |
| Multi-hop spatial reasoning | StepGame larger k generalization (test) | Accuracy (k=6)28.53 | 6 | |
| Multi-hop spatial reasoning | StepGame with distracting noise (test) | k=1 Accuracy85.77 | 6 |