Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SuperGLUE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language UnderstandingSuperGLUE (dev)
Average Score93.2
91
Natural Language UnderstandingSuperGLUE
SGLUE Score91.3
84
Natural Language UnderstandingSuperGLUE (test)
BoolQ Accuracy92.4
63
Natural Language UnderstandingSuperGLUE
SST-2 Accuracy96
18
Natural Language UnderstandingSuperGLUE RoBERTa-large (test)
ReCoRD89.21
17
Natural Language UnderstandingSuperGLUE few-shot
BoolQ Accuracy0.818
16
Natural Language UnderstandingSuperGLUE 1,000 examples
BoolQ Accuracy84
15
Natural Language UnderstandingSuperGLUE
WSC Score57.69
13
Natural Language ProcessingSuperGLUE Full, excl. ReCoRD (dev)
Macro Avg Score70.03
13
Natural Language ProcessingSuperGLUE 1k samples, excl. ReCoRD (dev)
Macro Avg Score65.84
13
Natural Language ProcessingSuperGLUE 100 samples, excl. ReCoRD (dev)
Macro Avg Score59.88
13
Natural Language UnderstandingSuperGLUE Zero-shot
BoolQ Accuracy88
11
Natural Language UnderstandingSuperGLUE 1,000 examples (test)
BoolQ86.7
10
Text ClassificationSuperGLUE (val)
Average Validation Score89.2
10
Failure DiagnosisSuperGLUE
Macro Similarity Score36
8
Natural Language UnderstandingSuperGLUE v1 (test)
BoolQ Acc91.3
7
Automated ProbingSuperGLUE
Error Rate38
3
Showing 17 of 17 rows