Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath Reasoning Suite Average
Average Accuracy63
49
Mathematical ReasoningMath Reasoning Suite Arithmetic Mean
Average Score (@16)78.1
15
Mathematical ReasoningMath Reasoning Suite MATH, GSM, AIME
MATH Score80.1
14
Mathematical ReasoningMath Reasoning Suite (Math500, AIME24, AMC, Gaokao, Minerva Math, Olympiad Bench)
Math500 Score93.2
12
Mathematical ReasoningMath Reasoning Suite AIME, AMC, MATH, Minerva, Olympiad (test)
Avg@8 (AIME)21.4
8
Mathematical ReasoningMath Reasoning Suite (MATH, AIME25, AMC, MINERVA, KAOYAN, OLYMPIAD, CN_MATH24)
MATH Accuracy92.2
6
Mathematical ReasoningMath Reasoning Suite (GSM8K, MATH500, AIME25, AIME26, AMC, Olympiad) (test)
GSM8K Score93.6
4
Showing 7 of 7 rows