Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMathematical Reasoning Suite (AMC, AIME 2024, AIME 2025, Minerva, MATH, Olympiad) various (test val)
Average Score62.1
55
Mathematical ReasoningMathematical Reasoning Suite (AIME24, AIME25, MATH, MINERVA, GPQA, GSM8K) Standard (test)
AIME24 Score68.75
26
Mathematical ReasoningMathematical Reasoning Suite MathQA, GSM8K, AddSub, SingleEq, SVAMP
MathQA Accuracy91.88
24
Mathematical ReasoningMathematical Reasoning Suite AIME24, AIME25, AMC23, GSM8K, MATH500
AIME 2024 Score25.9
15
Mathematical ReasoningMathematical Reasoning Suite (A24, A25, AMC, MATH, Minerva)
A24 Score52.29
13
Mathematical ReasoningMathematical Reasoning Suite (AIME24, AMC23, MATH, MIN., OLY.)
AIME24 Score43.3
12
Mathematical ReasoningMathematical Reasoning Suite (AMC, Minerva, MATH, GSM8K, Olympiad, AIME25, AIME24)
Overall Average Score56.46
12
Mathematical ReasoningMathematical Reasoning Suite (AIME24, Math500, Olympiad, AMC, Minerva)
AIME24 Score60.58
2
Showing 8 of 8 rows