Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Math Word Problem Solving benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Math Word Problem Solving
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
GSM8K
Oracle
Accuracy
95.3
158
15d ago
GSM8K
Llama-3.1-405B
Accuracy
96.8
111
2mo ago
Math23K (test)
Multi-view
Accuracy
87.1
73
3mo ago
Math23K (5-fold cross-val)
Multi-view
Accuracy
85.2
56
3mo ago
GSM8K official 1.3k set (test)
CAPO
Accuracy
93.7
53
3mo ago
SOMADHAN (test)
GPT-OSS-20B
Accuracy
0.88
45
3mo ago
SVAMP
NPG-Muse-8B
Value Accuracy
94.5
38
3mo ago
MathQA (test)
Expression Tree Decoding Strategy (Ours)
Accuracy
81.5
34
3mo ago
GSM8K official train (dev)
CoT
Accuracy
88.3
28
3mo ago
MAWPS (5-fold cross-val)
Multi-view
Accuracy
92.3
21
3mo ago
SVAMP English (test)
ATHENA
Accuracy
67.8
20
3mo ago
Math23K
Graph2Tree
Accuracy
0.7481
19
3mo ago
GSM8K (test)
MO-CAPO
Accuracy
89.3
18
14d ago
GSM8K 200 examples (test)
ChatGPT (gpt-3.5-turbo)
Accuracy
81.2
18
3mo ago
Math Word Problems 6 tasks
MIDAS
Accuracy
43.1
14
3mo ago
PARAMAWPS
DeBERTa (VM)
Value Accuracy
79.1
14
3mo ago
MAWPS original (whole dataset)
DeBERTa (PM + VM)
Value Accuracy
91
14
3mo ago
MAWPS English (test)
ATHENA
Accuracy
93
10
3mo ago
Illinois (IL) 562 single-step word problems (5-fold cross-validation)
Giving BERT a Calculator
Accuracy
83.2
10
3mo ago
UnbiasedMWP 1:N Chinese (test)
ATHENA
Accuracy
48.4
8
3mo ago
UnbiasedMWP Chinese (test)
ATHENA
Accuracy
42
8
3mo ago
Math23k Chinese (test)
ATHENA
Accuracy
86.5
8
3mo ago
SVAMP 10 low-resourced languages FLORES-200 (test)
DIP
Kazakh (Cyrillic) Accuracy
44
7
13d ago
ASDiv-A (5-fold cross-val)
MSAT-DEDUCTREASONER
Accuracy
87.5
7
3mo ago
GSM+ v1 (test)
Full-FT
Accuracy
65.7
6
3mo ago
Showing 25 of 35 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs