Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Math-domain reasoning benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath-domain reasoning benchmarks (MATH-500, Olympiad, Minerva, GSM8K, AMC, AIME24) (test)
Overall Score58.12
20
Mathematical ReasoningMath-domain reasoning benchmarks (GSM8K, MATH, MathQA) MathPile (test)
GSM8K Accuracy49.36
8
Showing 2 of 2 rows