Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Omni-MATH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningOmni-MATH
Accuracy66.9
123
Mathematical ReasoningOmni-MATH
ECE0.0883
28
Mathematical ReasoningOMNI-MATH
Overall Accuracy39.55
25
Mathematical ReasoningOmni-Math
Accuracy36.5
23
Mathematical ReasoningOmni-MATH
Avg@4 Accuracy28.16
18
Mathematical Problem SolvingOmni-MATH
Best-of-N Accuracy35.4
17
Mathematical Problem SolvingOmni-MATH
AUTC1,046.36
17
Mathematical ReasoningOmni-MATH
Algebra Accuracy37
16
Mathematical ReasoningOmni-Math
Average Score @825.34
14
Answer VerificationOmni-MATH terminal answers
AUROC0.9286
11
Mathematical ReasoningOmni-MATH
ECE8.67
11
Answer VerificationOmni-MATH
AUROC0.8591
11
Next-token reasoningOMNI-MATH Hard (val)
Accuracy38.1
10
Next-token reasoningOMNI-MATH Medium (val)
Accuracy (Next-token Reasoning)61.15
10
Next-token reasoningOMNI-MATH Easy (val)
Accuracy76.89
10
MathOmni-MATH
Score54.1
10
Data Contamination DetectionOmni-MATH Dataset C
Score (Reference)23.22
8
Mathematical Problem SolvingOmni-MATH 4,415 problems (Full Set)
Accuracy64
8
Reasoning Episode ClassificationOmni-MATH human-annotated Reasoning episodes (gold set)
Accuracy86.33
8
Mathematical & Symbolic ReasoningOmni-MATH Tier 2
Success Rate (SR)42.7
6
RankingOmni-MATH
Correlation79.1
5
Data Contamination DetectionOmni-MATH (Dataset U)
Reference Score (S)15.85
4
Mathematical Problem SolvingOmni-MATH Rule 2,821 problems
Accuracy69.7
4
Mathematical ReasoningOmni-MATH
Accuracy (Omni-MATH)32.2
4
Difficulty Correlation with LLM PerformanceOmni-Math
Pearson PCC0.91
4
Showing 25 of 29 rows