| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | BrUMO '25 | Rank1 | 40 | |
| Reasoning | BRUMO 2025 | Accuracy60.83 | 21 | |
| Mathematical Reasoning | BRUMO | Trace Count826 | 20 | |
| Reasoning | Brumo 25 | Trace Count613 | 20 | |
| Mathematical Reasoning | BRUMO25 | Pass@154.4 | 18 | |
| Reasoning | BrUMO25 | Pass@194.58 | 14 | |
| Mathematical Reasoning | BRUMO 2025 | PASS@169.48 | 11 | |
| Mathematical Reasoning | BRUMO | Accuracy67.5 | 7 | |
| Mathematical Reasoning | BRUMO 2025 (test) | Pass@1 Accuracy56.66 | 4 | |
| Mathematical Reasoning | BRUMO 2025 | Pass@451.42 | 2 | |
| Ranking Method Evaluation | BrUMO 25 | Mean Kendall's tau_b0.954 | 1 | |
| Ranking Correlation Analysis | BrUMO'25 | Kendall's tau_b (vs. Gold Standard)0.858 | 1 |