Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Math Word Problem Solving benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Math Word Problem Solving
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
GSM8K
Llama-3.1-405B
Accuracy
96.8
91
4d ago
Math23K (test)
Multi-view
Accuracy
87.1
73
4d ago
Math23K (5-fold cross-val)
Multi-view
Accuracy
85.2
56
4d ago
GSM8K official 1.3k set (test)
CAPO
Accuracy
93.7
53
4d ago
SOMADHAN (test)
GPT-OSS-20B
Accuracy
0.88
45
4d ago
SVAMP
NPG-Muse-8B
Value Accuracy
94.5
38
3d ago
MathQA (test)
Expression Tree Decoding Strategy (Ours)
Accuracy
81.5
34
4d ago
GSM8K official train (dev)
CoT
Accuracy
88.3
28
4d ago
MAWPS (5-fold cross-val)
Multi-view
Accuracy
92.3
21
4d ago
SVAMP English (test)
ATHENA
Accuracy
67.8
20
4d ago
Math23K
Graph2Tree
Accuracy
0.7481
19
4d ago
GSM8K 200 examples (test)
ChatGPT (gpt-3.5-turbo)
Accuracy
81.2
18
4d ago
Math Word Problems 6 tasks
MIDAS
Accuracy
43.1
14
4d ago
PARAMAWPS
DeBERTa (VM)
Value Accuracy
79.1
14
3d ago
MAWPS original (whole dataset)
DeBERTa (PM + VM)
Value Accuracy
91
14
3d ago
MAWPS English (test)
ATHENA
Accuracy
93
10
4d ago
Illinois (IL) 562 single-step word problems (5-fold cross-validation)
Giving BERT a Calculator
Accuracy
83.2
10
4d ago
UnbiasedMWP 1:N Chinese (test)
ATHENA
Accuracy
48.4
8
4d ago
UnbiasedMWP Chinese (test)
ATHENA
Accuracy
42
8
4d ago
Math23k Chinese (test)
ATHENA
Accuracy
86.5
8
4d ago
ASDiv-A (5-fold cross-val)
MSAT-DEDUCTREASONER
Accuracy
87.5
7
4d ago
GSM+ v1 (test)
Full-FT
Accuracy
65.7
6
4d ago
SVAMP hard (test)
MSAT-DEDUCTREASONER
Accuracy
48.2
6
4d ago
Ape210K (test)
REAL
Accuracy
77.18
4
4d ago
CM17K (test)
NS-Solver
Accuracy
54.05
4
4d ago
Showing 25 of 28 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Terms of Service
FAQs