| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Other tasks (test) | LM-Cocktail10 | Score60.28 | 36 | 1mo ago | |
| Countdown and OOD Tasks Overall (test) | R1 Distill -> GRPO | Accuracy35.9 | 10 | 1mo ago | |
| StoryCloze, OpenQA, ARC-E, ARC-C combined | Trajectory | Average Accuracy87.76 | 8 | 1mo ago |