| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Other tasks (test) | LM-Cocktail10 | Score60.28 | 36 | 3mo ago | |
| Multi-task Overall Average | Accuracy40.5 | 11 | 5d ago | ||
| Countdown and OOD Tasks Overall (test) | R1 Distill -> GRPO | Accuracy35.9 | 10 | 3mo ago | |
| StoryCloze, OpenQA, ARC-E, ARC-C combined | Trajectory | Average Accuracy87.76 | 8 | 3mo ago | |
| Real Hardware 4 tasks (untrained) | GRaD-Nav++ | Stage 1 Success Rate9 | 4 | 15d ago | |
| Real Hardware 8 tasks (train) | GRaD-Nav++ | Stage 1 Success Rate0.875 | 4 | 15d ago |