| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | MATHEMATICS | Accuracy74.1 | 46 | |
| Mathematical Reasoning | Mathematics out-of-domain (test) | Accuracy75.9 | 30 | |
| Mathematical Reasoning | Mathematics | Accuracy85.9 | 24 | |
| Mathematical Reasoning | Mathematics | Pass@165.8 | 18 | |
| Category Retrieval | Mathematics Amazon (test) | R@5031.4 | 15 | |
| Link Prediction | Mathematics | PREC@171.22 | 14 | |
| Reranking | Mathematics | NDCG@547.1 | 14 | |
| Reasoning | Mathematics | Normalized Score100 | 9 | |
| Mathematics | Mathematics (overall) | Mean Borda Score5.1388 | 8 | |
| Mathematical Optimization | Mathematics MinMaxMinDist | Score4.1658 | 3 | |
| Mathematical Optimization | Mathematics Circle-Packing | Score2.636 | 3 | |
| Language Modeling | Mathematics (val) | Perplexity556.73 | 2 | |
| Mathematics Evaluation | Mathematics Task | Token Match Rate30 | 2 |