Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath Benchmarks Aggregate
Accuracy (Avg)81.9
62
Mathematical ReasoningMath Benchmarks GSM8K, Minerva, MATH, MathQA
GSM8K Score59.24
53
Mathematical ReasoningMath Benchmarks Average
Accuracy (ACC)76.1
47
Mathematical ReasoningMath Benchmarks Aggregate
Pass@171.8
44
Mathematical ReasoningMath Benchmarks AIME 2024 AIME 2025 OlympiadBench
AIME 2024 Score19.5
19
Math ReasoningMean of six math benchmarks
Pass@143.8
12
Mathematical ReasoningMath Benchmarks Overall (test)
Pass@187
12
Math Problem SolvingMath Benchmarks LIMO curation (test)
Accuracy72.6
10
Math ReasoningMath Benchmarks MATH, GSM8K, AMC23, AIME24, Minerva, Gaokao, Olympiad (test)
MATH Score75.1
10
Mathematical ReasoningMath Benchmarks (GSM8K, MATH, AMC23, AIME24) (test)
Accuracy (GSM8K)96
8
Mathematical ReasoningMath Benchmarks Math500, OlympiadBench, Minerva, AIME, AMC
Math500 Accuracy85.6
7
Mathematical ReasoningMath Benchmarks evaluated on Llama 3-70B
GSM8K Accuracy78.2
5
Mathematical ReasoningMath Benchmarks MATH, MATH500, ThmQA
MATH multi@5 Accuracy67.6
4
Mathematical ReasoningMath Benchmarks (test)
GSM8K Accuracy28.9
3
Showing 14 of 14 rows