Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on Combined Math Datasets (SVAMP, GSM8K, AddSub, MultiArith, AQUA, SingleEq)
Loading...
92.9
Average Score
DUP
80.42
83.66
86.9
90.14
Apr 23, 2024
Average Score
Delta Value
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
Delta Value
DUP
Model=GPT-4, Evaluatio...
2024.04
92.9
2.3
Zero-shot PS+
Model=GPT-4, Evaluatio...
2024.04
91.4
0.8
Zero-shot CoT
Model=GPT-4, Evaluatio...
2024.04
90.6
-
Least-to-Most
Model=GPT-4, Evaluatio...
2024.04
89.7
-0.9
DUP
Model=GPT-3.5-Turbo, E...
2024.04
84.9
4
Auto-CoT
Model=GPT-3.5-Turbo, E...
2024.04
83.4
2.5
Least-to-Most
Model=GPT-3.5-Turbo, E...
2024.04
82.6
1.7
Manual-CoT
Model=GPT-3.5-Turbo, E...
2024.04
82.6
1.7
Zero-shot PS+
Model=GPT-3.5-Turbo, E...
2024.04
81.2
0.3
Zero-shot CoT
Model=GPT-3.5-Turbo, E...
2024.04
80.9
-
Feedback
Search any
task
Search any
task