Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math Reasoning

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference ModelingMath Reasoning
Accuracy87.6
20
Math ReasoningMath Reasoning Long Q, Long A (test)
Pass@10.65
15
Mathematical ReasoningMath Reasoning Out-domain (SVAMP, Mathematics, SimulEq) (test)
SVAMP Accuracy79.6
8
Mathematical ReasoningMath Reasoning In-domain (GSM8K, MATH, NumGLUE) (test)
GSM8K Accuracy69.1
8
Math ReasoningMath Reasoning Aggregate
Avg@3240.08
6
Preference ClassificationMath Reasoning (test)
Classification Accuracy85.4
4
Math ReasoningMath Reasoning 1.5B model (val)
Validation Accuracy69.4
3
Showing 7 of 7 rows