| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | NuminaMath | Accuracy78.67 | 20 | |
| Mathematical Reasoning | NuminaMath | Math Accuracy62.45 | 18 | |
| Mathematical Reasoning | NuminaMath (val) | Accuracy21.7 | 8 | |
| Reward Prediction | NuminaMath (out-of-domain) | Accuracy42.22 | 6 | |
| Mathematical Reasoning | NuminaMath subset of 5,000 samples | Accuracy73.94 | 3 | |
| Mathematical Reasoning | NuminaMath | Accuracy60.9 | 1 |