| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | AMC23 | Pass@k98.6 | 35 | |
| Mathematical Reasoning | AMC23 decontaminated | Accuracy69.8 | 14 | |
| Mathematical Reasoning | AMC23 | Average Score @3291.4 | 14 | |
| Mathematical Reasoning | AMC23 | Accuracy83.2 | 12 | |
| Multi-Turn Tool-Integrated Reasoning (TIR) | AMC23 | Peak avg@32 Score79.45 | 6 | |
| Competition Mathematics Reasoning | AMC23 | Full Length10.9 | 4 |