Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Math Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Math ReasoningMath Reasoning Long Q, Long A (test)
Pass@10.65
15
Mathematical ReasoningMath Reasoning Out-domain (SVAMP, Mathematics, SimulEq) (test)
SVAMP Accuracy79.6
8
Mathematical ReasoningMath Reasoning In-domain (GSM8K, MATH, NumGLUE) (test)
GSM8K Accuracy69.1
8
Math ReasoningMath Reasoning Aggregate
Avg@3240.08
6
Math ReasoningMath Reasoning 1.5B model (val)
Validation Accuracy69.4
3
Showing 5 of 5 rows