| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Omni-MATH | Accuracy66.9 | 68 | |
| Mathematical Reasoning | Omni-Math | Average Score @825.34 | 14 | |
| Math | Omni-MATH | Score54.1 | 10 | |
| Reasoning Episode Classification | Omni-MATH human-annotated Reasoning episodes (gold set) | Accuracy86.33 | 8 | |
| Mathematical & Symbolic Reasoning | Omni-MATH Tier 2 | Success Rate (SR)42.7 | 6 | |
| Difficulty Correlation with LLM Performance | Omni-Math | Pearson PCC0.91 | 4 | |
| Reasoning Episode Classification | Omni-MATH Non-Reasoning episodes (human-annotated gold set) | Accuracy89.34 | 4 | |
| Difficulty Correlation with Human Labels | Omni-Math n=1876 | Pearson Correlation0.82 | 2 |