Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SVAMP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningSVAMP
Accuracy97
368
Mathematical ReasoningSVAMP (test)
Accuracy94
233
Arithmetic ReasoningSVAMP (test)
Accuracy98.16
54
Arithmetic ReasoningSVAMP
Accuracy (Overall)93.7
54
Mathematical ReasoningSVAMP out-of-domain (test)
Accuracy97
50
Hallucination DetectionSVAMP
Mean AUROC78.37
48
Arithmetic ReasoningSVAMP
Accuracy94.2
48
Math Word Problem solvingSVAMP
Value Accuracy94.5
38
Mathematical ReasoningSVAMP (val)
Accuracy85.1
36
Mathematical ReasoningSVAMP
Pass@193.1
35
Mathematical ReasoningSVAMP
Accuracy94.31
21
Mathematical ReasoningSVAMP
AUROC0.6211
20
Math Word Problem solvingSVAMP English (test)
Accuracy67.8
20
Mathematical ReasoningSVAMP
Pass@597
16
Mathematical ReasoningSVAMP
Accuracy83.33
14
Math ReasoningSVAMP (held-out)
Performance78.3
14
Arithmetic ReasoningSVAMP latest (test)
Accuracy64.8
14
Mathematical ReasoningSVAMP
Verifiability Score97.33
12
Mathematical ReasoningSVAMP
Reusability Score71.11
12
Arithmetic ReasoningSVAMP
Accuracy69.3
12
Math ReasoningSVAMP
Accuracy78.67
10
Mathematical ReasoningSVAMP
Accuracy (Context Size 128)0.8933
9
Mathematical ReasoningSVAMP
Accuracy59.8
9
Mathematical ReasoningSVAMP
Answer Correctness Rate53.8
8
Uncertainty EstimationSVAMP
AUROC93.6
7
Showing 25 of 41 rows