| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | BRUMO 2025 | Accuracy93.3 | 52 | |
| Mathematical Reasoning | BrUMO '25 | Rank1 | 40 | |
| Mathematical Reasoning | BRUMO (DEF.) | Pass@12866.67 | 30 | |
| Reasoning | BRUMO 2025 | Accuracy60.83 | 21 | |
| Reasoning | BRUMO25 | Avg@k Score78.75 | 20 | |
| Mathematical Reasoning | BRUMO | Trace Count826 | 20 | |
| Reasoning | Brumo 25 | Trace Count613 | 20 | |
| Mathematical Reasoning | BRUMO25 | Pass@154.4 | 18 | |
| Mathematical Reasoning | Brumo | Pass@1 Accuracy81.9 | 15 | |
| Reasoning | BrUMO25 | Pass@194.58 | 14 | |
| Mathematical Reasoning | BRUMO | Accuracy80 | 12 | |
| Mathematical Reasoning | BRUMO 2025 | PASS@169.48 | 11 | |
| Mathematical Reasoning | Brumo 2025 | Accuracy54.79 | 10 | |
| Math Reasoning | BRUMO 2025 | Pass@130.4 | 8 | |
| Mathematical Reasoning | Brumo 2025 | Pass@196.4 | 8 | |
| Mathematical Reasoning | BRUMO 2025 | Avg@32 Score63.3 | 8 | |
| Math Reasoning | BRUMO'25 | Pass@147.39 | 4 | |
| Mathematical Reasoning | BRUMO 2025 | Majority@204823.3 | 4 | |
| Mathematical Reasoning | BRUMO | REST Score73.3 | 4 | |
| Mathematical Reasoning | BRUMO 2025 (test) | Pass@1 Accuracy56.66 | 4 | |
| Mathematical Reasoning | BRUMO 2025 | Pass@186.2 | 3 | |
| Math Reasoning | BRUMO 2025 | Pass@14.79 | 2 | |
| Mathematical Reasoning | BRUMO 2025 | Pass@451.42 | 2 | |
| Ranking Method Evaluation | BrUMO 25 | Mean Kendall's tau_b0.954 | 1 | |
| Ranking Correlation Analysis | BrUMO'25 | Kendall's tau_b (vs. Gold Standard)0.858 | 1 |