Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Math Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath Benchmarks Aggregate
Pass@171.8
44
Mathematical ReasoningMath Benchmarks Aggregate
Accuracy (Avg)81.9
18
Mathematical ReasoningMath Benchmarks Overall (test)
Pass@187
12
Math ReasoningMath Benchmarks MATH, GSM8K, AMC23, AIME24, Minerva, Gaokao, Olympiad (test)
MATH Score75.1
10
Mathematical ReasoningMath Benchmarks (GSM8K, MATH, AMC23, AIME24) (test)
Accuracy (GSM8K)96
8
Mathematical ReasoningMath Benchmarks (test)
GSM8K Accuracy28.9
3
Showing 6 of 6 rows