Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GSM8K

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM8K
Accuracy97.1
983
Mathematical ReasoningGSM8K (test)
Accuracy99
797
Mathematical ReasoningGSM8K (test)
Accuracy97.72
751
Mathematical ReasoningGSM8K
Accuracy (GSM8K)97.8
358
Mathematical ReasoningGSM8K
Accuracy97.04
351
Mathematical ReasoningGSM8k
Accuracy96.21
212
Mathematical ReasoningGSM8K
Speed Up (x)10.72
177
Mathematical ReasoningGSM8K
Math Score96.4
171
Arithmetic ReasoningGSM8K
Accuracy97.1
155
Math ReasoningGSM8K (test)
Accuracy94.5
155
Arithmetic ReasoningGSM8K (test)
Accuracy97.35
129
Math ReasoningGSM8K
Accuracy93.8
126
Mathematical ReasoningGSM8K
EM97.04
115
Mathematical ReasoningGSM8K
pass@196.7
102
Math Word Problem SolvingGSM8K
Accuracy96.8
91
MathGSM8K
Accuracy0.95
87
ReasoningGSM8K
Accuracy1
83
Mathematical ReasoningGSM8K (test)
Accuracy89.2
79
Mathematical ReasoningGSM8K (val)
Accuracy90.8
67
Mathematical ReasoningGSM8K (test)
HS59.6
62
Mathematical ReasoningGSM8K
Accuracy89.15
57
Mathematical reasoningGSM8K
Tau ($ au$)5.39
54
Hallucination DetectionGSM8K
AUROC90.37
53
Math Word Problem SolvingGSM8K official 1.3k set (test)
Accuracy93.7
53
Hallucination DetectionGSM8K (test)
AUROC (Reference)79.01
48
Showing 25 of 290 rows
...