Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath Benchmarks GSM8K, Minerva, MATH, MathQA
GSM8K Score59.24
53
Mathematical ReasoningMath Benchmarks Aggregate
Pass@171.8
44
Mathematical ReasoningMath Benchmarks Aggregate
Accuracy (Avg)81.9
40
Mathematical ReasoningMath Benchmarks Average
Accuracy (ACC)76.1
35
Math ReasoningMean of six math benchmarks
Pass@143.8
12
Mathematical ReasoningMath Benchmarks Overall (test)
Pass@187
12
Math Problem SolvingMath Benchmarks LIMO curation (test)
Accuracy72.6
10
Math ReasoningMath Benchmarks MATH, GSM8K, AMC23, AIME24, Minerva, Gaokao, Olympiad (test)
MATH Score75.1
10
Mathematical ReasoningMath Benchmarks (GSM8K, MATH, AMC23, AIME24) (test)
Accuracy (GSM8K)96
8
Mathematical ReasoningMath Benchmarks (test)
GSM8K Accuracy28.9
3
Showing 10 of 10 rows