| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | BeyondAIME | avg@1661.7 | 23 | |
| Mathematical Reasoning | BeyondAIME | Accuracy82.5 | 18 | |
| Confidence Calibration | BeyondAIME (test) | SNR Gain1.202 | 15 | |
| Reasoning | BeyondAIME | Pass@170.38 | 14 | |
| Mathematics | BeyondAIME | Avg@1066.56 | 9 | |
| Mathematical Reasoning | BeyondAIME | Pass@18.3 | 8 | |
| Claim-level Confidence Calibration | BeyondAIME | SNR Gain0.301 | 7 | |
| Tool-integrated Mathematical Reasoning | BeyondAIME | Pass@141 | 6 | |
| Mathematical Reasoning | BeyondAIME | Mean@1071.8 | 4 | |
| Mathematical Reasoning | BeyondAIME | Pass@1627.84 | 3 | |
| Mathematical Reasoning | BeyondAIME | pass@6431.3 | 3 | |
| Mathematical Reasoning | BeyondAIME | Turn 1 Score2 | 2 |