| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Confidence Calibration | BeyondAIME (test) | SNR Gain1.202 | 15 | |
| Reasoning | BeyondAIME | Pass@170.38 | 14 | |
| Mathematics | BeyondAIME | Avg@1066.56 | 9 | |
| Mathematical Reasoning | BeyondAIME | avg@1661.7 | 8 | |
| Claim-level Confidence Calibration | BeyondAIME | SNR Gain0.301 | 7 | |
| Mathematical Reasoning | BeyondAIME | Mean@1071.8 | 4 |