Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Arithmetic Reasoning on MAWPS (5-fold cross val)
Loading...
94.3
Accuracy
MSAT-DEDUCTREASONER
52.544
63.3845
74.225
85.0655
Dec 16, 2022
Jan 13, 2023
Feb 10, 2023
Mar 10, 2023
Apr 7, 2023
May 5, 2023
Jun 2, 2023
Accuracy
Accuracy (with calculator)
Delta
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy (with calculator)
Delta
MSAT-DEDUCTREASONER
Input Configuration=di...
2023.06
94.3
-
2.3
Large language models w/ Chain-of-Thought prompting
Backbone=PaLM 540B, Pr...
2023.06
93.3
-
-
PaLM 540B (CoT 8-shot)
Model=PaLM 540B, Finet...
2022.12
93
93.66
-
DEDUCTREASONER
Input Configuration=sy...
2023.06
92
-
-
MSAT-ROBERTAGEN
Input Configuration=di...
2023.06
91.6
-
3.2
DEDUCTREASONER
Input Configuration=di...
2023.06
91.6
-
-0.4
ROBERTAGEN
Input Configuration=sy...
2023.06
88.4
-
-
ROBERTAGEN
Input Configuration=di...
2023.06
84.1
-
-4.3
T5 XXL (CoT Finetuned)
Model=T5 XXL, Finetuni...
2022.12
70.41
88.22
-
T5 XXL (Baseline)
Model=T5 XXL, Finetuni...
2022.12
54.15
-
-
Feedback
Search any
task
Search any
task