Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SuperGLUE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language UnderstandingSuperGLUE (dev)
Average Score93.2
91
Natural Language UnderstandingSuperGLUE
SGLUE Score91.3
84
Natural Language UnderstandingSuperGLUE (test)
BoolQ Accuracy92.4
74
Natural Language UnderstandingSuperGLUE
CB Accuracy94.5
32
Natural Language UnderstandingSuperGLUE
WSC Score76.9
25
Natural Language UnderstandingSuperGLUE
MultiRC Score75.9
22
Natural Language UnderstandingSuperGLUE
SST-2 Accuracy96
18
Natural Language UnderstandingSuperGLUE RoBERTa-large (test)
ReCoRD89.21
17
Natural Language UnderstandingSuperGLUE few-shot
BoolQ Accuracy0.818
16
Natural Language UnderstandingSuperGLUE 1,000 examples
BoolQ Accuracy84
15
Multiple ChoiceSuperGLUE
COPA Score83
14
ClassificationSuperGLUE
CB Accuracy96.4
14
Natural Language ProcessingSuperGLUE Full, excl. ReCoRD (dev)
Macro Avg Score70.03
13
Natural Language ProcessingSuperGLUE 1k samples, excl. ReCoRD (dev)
Macro Avg Score65.84
13
Natural Language ProcessingSuperGLUE 100 samples, excl. ReCoRD (dev)
Macro Avg Score59.88
13
Natural Language UnderstandingSuperGLUE (test val)
SST-2 Accuracy96
12
Natural Language UnderstandingSuperGLUE Zero-shot
BoolQ Accuracy88
11
Natural Language UnderstandingSuperGLUE 1,000 examples (test)
BoolQ86.7
10
Text ClassificationSuperGLUE (val)
Average Validation Score89.2
10
NLU and Question AnsweringSuperGLUE
SST-2 Accuracy94.7
9
Failure DiagnosisSuperGLUE
Macro Similarity Score36
8
Natural Language UnderstandingSuperGLUE v1 (test)
BoolQ Acc91.3
7
Natural Language UnderstandingSuperGLUE
BoolQ Accuracy88.5
6
Natural Language UnderstandingSuperGLUE
Accuracy (SST2)94.81
6
ClassificationSuperGLUE
RTE Score72.2
6
Showing 25 of 28 rows