| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Omni-MATH | Accuracy66.9 | 93 | |
| Mathematical Reasoning | Omni-MATH | ECE0.0883 | 28 | |
| Mathematical Reasoning | OMNI-MATH | Overall Accuracy39.55 | 25 | |
| Mathematical Problem Solving | Omni-MATH | Best-of-N Accuracy35.4 | 17 | |
| Mathematical Problem Solving | Omni-MATH | AUTC1,046.36 | 17 | |
| Mathematical Reasoning | Omni-MATH | Algebra Accuracy37 | 16 | |
| Mathematical Reasoning | Omni-Math | Average Score @825.34 | 14 | |
| Answer Verification | Omni-MATH terminal answers | AUROC0.9286 | 11 | |
| Mathematical Reasoning | Omni-MATH | ECE8.67 | 11 | |
| Answer Verification | Omni-MATH | AUROC0.8591 | 11 | |
| Next-token reasoning | OMNI-MATH Hard (val) | Accuracy38.1 | 10 | |
| Next-token reasoning | OMNI-MATH Medium (val) | Accuracy (Next-token Reasoning)61.15 | 10 | |
| Next-token reasoning | OMNI-MATH Easy (val) | Accuracy76.89 | 10 | |
| Math | Omni-MATH | Score54.1 | 10 | |
| Reasoning Episode Classification | Omni-MATH human-annotated Reasoning episodes (gold set) | Accuracy86.33 | 8 | |
| Mathematical & Symbolic Reasoning | Omni-MATH Tier 2 | Success Rate (SR)42.7 | 6 | |
| Ranking | Omni-MATH | Correlation79.1 | 5 | |
| Mathematical Reasoning | Omni-MATH | Accuracy (Omni-MATH)32.2 | 4 | |
| Difficulty Correlation with LLM Performance | Omni-Math | Pearson PCC0.91 | 4 | |
| Reasoning Episode Classification | Omni-MATH Non-Reasoning episodes (human-annotated gold set) | Accuracy89.34 | 4 | |
| Mathematical Problem Solving | Omni-MATH | ECE13 | 3 | |
| Mathematical Reasoning | Omni-MATH | Accuracy (Best-of-64)35.4 | 3 | |
| Difficulty Correlation with Human Labels | Omni-Math n=1876 | Pearson Correlation0.82 | 2 |