| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Omni-MATH | Accuracy66.9 | 123 | |
| Mathematical Reasoning | Omni-MATH | ECE0.0883 | 28 | |
| Mathematical Reasoning | OMNI-MATH | Overall Accuracy39.55 | 25 | |
| Mathematical Reasoning | Omni-Math | Accuracy36.5 | 23 | |
| Mathematical Reasoning | Omni-MATH | Avg@4 Accuracy28.16 | 18 | |
| Mathematical Problem Solving | Omni-MATH | Best-of-N Accuracy35.4 | 17 | |
| Mathematical Problem Solving | Omni-MATH | AUTC1,046.36 | 17 | |
| Mathematical Reasoning | Omni-MATH | Algebra Accuracy37 | 16 | |
| Mathematical Reasoning | Omni-Math | Average Score @825.34 | 14 | |
| Answer Verification | Omni-MATH terminal answers | AUROC0.9286 | 11 | |
| Mathematical Reasoning | Omni-MATH | ECE8.67 | 11 | |
| Answer Verification | Omni-MATH | AUROC0.8591 | 11 | |
| Next-token reasoning | OMNI-MATH Hard (val) | Accuracy38.1 | 10 | |
| Next-token reasoning | OMNI-MATH Medium (val) | Accuracy (Next-token Reasoning)61.15 | 10 | |
| Next-token reasoning | OMNI-MATH Easy (val) | Accuracy76.89 | 10 | |
| Math | Omni-MATH | Score54.1 | 10 | |
| Data Contamination Detection | Omni-MATH Dataset C | Score (Reference)23.22 | 8 | |
| Mathematical Problem Solving | Omni-MATH 4,415 problems (Full Set) | Accuracy64 | 8 | |
| Reasoning Episode Classification | Omni-MATH human-annotated Reasoning episodes (gold set) | Accuracy86.33 | 8 | |
| Mathematical & Symbolic Reasoning | Omni-MATH Tier 2 | Success Rate (SR)42.7 | 6 | |
| Ranking | Omni-MATH | Correlation79.1 | 5 | |
| Data Contamination Detection | Omni-MATH (Dataset U) | Reference Score (S)15.85 | 4 | |
| Mathematical Problem Solving | Omni-MATH Rule 2,821 problems | Accuracy69.7 | 4 | |
| Mathematical Reasoning | Omni-MATH | Accuracy (Omni-MATH)32.2 | 4 | |
| Difficulty Correlation with LLM Performance | Omni-Math | Pearson PCC0.91 | 4 |