Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Omni-MATH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningOmni-MATH
Accuracy66.9
93
Mathematical ReasoningOmni-MATH
ECE0.0883
28
Mathematical ReasoningOMNI-MATH
Overall Accuracy39.55
25
Mathematical Problem SolvingOmni-MATH
Best-of-N Accuracy35.4
17
Mathematical Problem SolvingOmni-MATH
AUTC1,046.36
17
Mathematical ReasoningOmni-MATH
Algebra Accuracy37
16
Mathematical ReasoningOmni-Math
Average Score @825.34
14
Answer VerificationOmni-MATH terminal answers
AUROC0.9286
11
Mathematical ReasoningOmni-MATH
ECE8.67
11
Answer VerificationOmni-MATH
AUROC0.8591
11
Next-token reasoningOMNI-MATH Hard (val)
Accuracy38.1
10
Next-token reasoningOMNI-MATH Medium (val)
Accuracy (Next-token Reasoning)61.15
10
Next-token reasoningOMNI-MATH Easy (val)
Accuracy76.89
10
MathOmni-MATH
Score54.1
10
Reasoning Episode ClassificationOmni-MATH human-annotated Reasoning episodes (gold set)
Accuracy86.33
8
Mathematical & Symbolic ReasoningOmni-MATH Tier 2
Success Rate (SR)42.7
6
RankingOmni-MATH
Correlation79.1
5
Mathematical ReasoningOmni-MATH
Accuracy (Omni-MATH)32.2
4
Difficulty Correlation with LLM PerformanceOmni-Math
Pearson PCC0.91
4
Reasoning Episode ClassificationOmni-MATH Non-Reasoning episodes (human-annotated gold set)
Accuracy89.34
4
Mathematical Problem SolvingOmni-MATH
ECE13
3
Mathematical ReasoningOmni-MATH
Accuracy (Best-of-64)35.4
3
Difficulty Correlation with Human LabelsOmni-Math n=1876
Pearson Correlation0.82
2
Showing 23 of 23 rows