Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMathematical Reasoning Benchmarks (GSM8K, MATH, AMC23, Olympiad, Minerva) (test)
GSM8K Accuracy94
32
Mathematical ReasoningMathematical Reasoning Benchmarks AIME24, MATH500, Olympiad, AMC, Minerva (test)
AIME24 Score33.3
22
Mathematical ReasoningMathematical Reasoning Benchmarks (AddSub, AQuA, GSM8k, MultiArith, SingleEq, SVAMP) (test)
Accuracy (AddSub)96.96
18
Mathematical ReasoningMathematical Reasoning Benchmarks
AIME 24 Score59.8
16
Mathematical ReasoningMathematical Reasoning Benchmarks (test)
Average Score53.72
16
Mathematical ReasoningMathematical Reasoning Benchmarks Average
Pass@4 Score71.88
14
Mathematical ReasoningMathematical Reasoning Benchmarks AIME24, AMC23, MATH500
AIME24 Score15
6
Mathematical ReasoningMathematical Reasoning Benchmarks (MATH, AIME25, AMC, MINERVA, KAOYAN, OLYMPIAD, CN_MATH24)
MATH Accuracy91.1
4
Mathematical ReasoningMathematical Reasoning Benchmarks AIME24, Math500, Olympiad, AMC, Minerva
AIME24 Score15.6
3
Mathematical ReasoningMathematical Reasoning Benchmarks AIME 24, AIME 25, AMC, Minerva, MATH-500, Olympiad, Gaokao23 (test)
AIME 2024 Score43.3
3
Showing 10 of 10 rows