Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Math Word Problem Solving benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Math Word Problem Solving
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
GSM8K
Llama-3.1-405B
Accuracy
96.8
111
1mo ago
GSM8K
Oracle
Accuracy
95.3
87
3d ago
Math23K (test)
Multi-view
Accuracy
87.1
73
1mo ago
Math23K (5-fold cross-val)
Multi-view
Accuracy
85.2
56
1mo ago
GSM8K official 1.3k set (test)
CAPO
Accuracy
93.7
53
1mo ago
SOMADHAN (test)
GPT-OSS-20B
Accuracy
0.88
45
1mo ago
SVAMP
NPG-Muse-8B
Value Accuracy
94.5
38
1mo ago
MathQA (test)
Expression Tree Decoding Strategy (Ours)
Accuracy
81.5
34
1mo ago
GSM8K official train (dev)
CoT
Accuracy
88.3
28
1mo ago
MAWPS (5-fold cross-val)
Multi-view
Accuracy
92.3
21
1mo ago
SVAMP English (test)
ATHENA
Accuracy
67.8
20
1mo ago
Math23K
Graph2Tree
Accuracy
0.7481
19
1mo ago
GSM8K 200 examples (test)
ChatGPT (gpt-3.5-turbo)
Accuracy
81.2
18
1mo ago
Math Word Problems 6 tasks
MIDAS
Accuracy
43.1
14
1mo ago
PARAMAWPS
DeBERTa (VM)
Value Accuracy
79.1
14
1mo ago
MAWPS original (whole dataset)
DeBERTa (PM + VM)
Value Accuracy
91
14
1mo ago
MAWPS English (test)
ATHENA
Accuracy
93
10
1mo ago
Illinois (IL) 562 single-step word problems (5-fold cross-validation)
Giving BERT a Calculator
Accuracy
83.2
10
1mo ago
UnbiasedMWP 1:N Chinese (test)
ATHENA
Accuracy
48.4
8
1mo ago
UnbiasedMWP Chinese (test)
ATHENA
Accuracy
42
8
1mo ago
Math23k Chinese (test)
ATHENA
Accuracy
86.5
8
1mo ago
ASDiv-A (5-fold cross-val)
MSAT-DEDUCTREASONER
Accuracy
87.5
7
1mo ago
GSM+ v1 (test)
Full-FT
Accuracy
65.7
6
1mo ago
SVAMP hard (test)
MSAT-DEDUCTREASONER
Accuracy
48.2
6
1mo ago
GSM8K
Qwen3 30B
Accuracy
0.9636
5
1mo ago
Showing 25 of 32 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs