| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Math Reasoning | Math Reasoning Long Q, Long A (test) | Pass@10.65 | 15 | |
| Mathematical Reasoning | Math Reasoning Out-domain (SVAMP, Mathematics, SimulEq) (test) | SVAMP Accuracy79.6 | 8 | |
| Mathematical Reasoning | Math Reasoning In-domain (GSM8K, MATH, NumGLUE) (test) | GSM8K Accuracy69.1 | 8 | |
| Math Reasoning | Math Reasoning Aggregate | Avg@3240.08 | 6 | |
| Math Reasoning | Math Reasoning 1.5B model (val) | Validation Accuracy69.4 | 3 |