Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MathQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical reasoningMathQA
Accuracy98.84
354
Mathematical ReasoningMathQA (test)
Accuracy87.6
52
Question AnsweringMathQA (test)
Accuracy81.05
41
Question AnsweringMathQA
Accuracy78.7
36
Math Word Problem solvingMathQA (test)
Accuracy81.5
34
Mathematical ReasoningMathQA
Retention25.19
28
Zero-shot ReasoningMathQA
Accuracy28.4
26
ReasoningMathQA
CACC75.9
25
Mathematical ReasoningMathQA OOD (test)
Accuracy63
24
Mathematical ReasoningMathQA
Accuracy44
20
Correctness PredictionMathQA
Accuracy66.15
18
ReasoningMathQA leave-one-out setup
Average Accuracy56.9
12
Mathematical ReasoningMathQA
Average Acceptance Length τ2,555
12
mathematical computationMathQA
Exact Match (EM)52.34
10
Mathematical Question AnsweringMathQA
Accuracy64.55
8
CalibrationMathQA
ECE' Gain0.344
8
Math ProgrammingMathQA Python
Pass@8087.4
8
Downstream TaskMathQA
Accuracy24.32
7
Mathematical ReasoningMathQA
Exact Match52.4
6
Numerical Question AnsweringMathQA (test)
Program Accuracy83
6
Common Sense ReasoningMathQA
Accuracy64
4
Code GenerationMathQA Python Original (test)
Pass@8084.7
4
CoT Soundness EvaluationMathQA
CSR92
3
CoT NaturalnessMathQA
PPL22.1
3
Code GenerationMathQA
Normalized Performance100.79
3
Showing 25 of 28 rows