Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MathQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical reasoningMathQA
Accuracy98.84
305
Question AnsweringMathQA (test)
Accuracy81.05
41
Math Word Problem solvingMathQA (test)
Accuracy81.5
34
Mathematical ReasoningMathQA (test)
Accuracy87.6
33
Mathematical ReasoningMathQA
Retention25.19
28
Zero-shot ReasoningMathQA
Accuracy28.4
26
ReasoningMathQA
CACC75.9
25
Correctness PredictionMathQA
Accuracy66.15
18
ReasoningMathQA leave-one-out setup
Average Accuracy56.9
12
Mathematical ReasoningMathQA
Average Acceptance Length τ2,555
12
Question AnsweringMathQA
Accuracy78.7
12
mathematical computationMathQA
Exact Match (EM)52.34
10
Math ProgrammingMathQA Python
Pass@8087.4
8
Downstream TaskMathQA
Accuracy24.32
7
Numerical Question AnsweringMathQA (test)
Program Accuracy83
6
Common Sense ReasoningMathQA
Accuracy64
4
Code GenerationMathQA Python Original (test)
Pass@8084.7
4
CoT Soundness EvaluationMathQA
CSR92
3
CoT NaturalnessMathQA
PPL22.1
3
Code GenerationMathQA
Normalized Performance100.79
3
Human EvaluationMathQA
Accuracy89.2
3
Code GenerationMathQA Python Filtered (dev)
PASS@120.7
3
Multiple Choice Question AnsweringMathQA
Accuracy22.21
2
Showing 23 of 23 rows