Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SVAMP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningSVAMP
Accuracy97
403
Mathematical ReasoningSVAMP (test)
Accuracy94
262
Arithmetic ReasoningSVAMP
Accuracy96.01
61
Arithmetic ReasoningSVAMP (test)
Accuracy98.16
54
Arithmetic ReasoningSVAMP
Accuracy (Overall)93.7
54
Mathematical ReasoningSVAMP out-of-domain (test)
Accuracy97
50
Hallucination DetectionSVAMP
Mean AUROC78.37
48
Math ReasoningSVAMP
Accuracy94.2
40
Math Word Problem solvingSVAMP
Value Accuracy94.5
38
Mathematical ReasoningSVAMP (val)
Accuracy85.1
36
Mathematical ReasoningSVAMP
Pass@193.1
35
Speculative SamplingSVAMP
Average Acceptance Length5.38
28
Group Collusive Attack DetectionSVAMP
Detection Accuracy92
27
Mathematical ReasoningSVAMP 8-shot (test)
Accuracy92
25
Mathematical ReasoningSVAMP
Accuracy94.31
21
Mathematical ReasoningSVAMP
AUROC0.6211
20
Math Word Problem solvingSVAMP English (test)
Accuracy67.8
20
Mathematical ReasoningSVAMP
Pass@597
16
Inference AttackSVAMP
AUC97.61
15
Mathematical ReasoningSVAMP
Accuracy94.1
15
Mathematical Word Problem SolvingSVAMP
Accuracy96.6
14
Mathematical ReasoningSVAMP
Accuracy83.33
14
Math ReasoningSVAMP (held-out)
Performance78.3
14
Arithmetic ReasoningSVAMP latest (test)
Accuracy64.8
14
Mathematical ReasoningSVAMP
Accuracy79.53
12
Showing 25 of 57 rows