| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | DeepMath | Pass@170.5 | 44 | |
| Mathematical Reasoning | DeepMath 2025 (test) | Pass@155.8 | 32 | |
| Mathematical Reasoning | DeepMath | Accuracy46.36 | 30 | |
| Mathematical Reasoning | DEEPMATH 128 samples | Top-1 Accuracy35.93 | 12 | |
| Mathematical Reasoning | DeepMath500 | Pass@1 Rate69 | 12 | |
| Mathematical Reasoning | DeepMath (test) | Pass@162 | 12 | |
| Theorem Proving | DeepMath | FR (Fetch Rate)94 | 8 | |
| Mathematical Reasoning | DeepMath 103K subset | Accuracy65.9 | 6 | |
| Mathematical and General Reasoning | DeepMATH (test) | MATH 500 Score83.4 | 4 |