Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SuperGLUE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language UnderstandingSuperGLUE (dev)
Average Score93.2
91
Natural Language UnderstandingSuperGLUE
SGLUE Score91.3
84
Natural Language UnderstandingSuperGLUE (test)
BoolQ Accuracy92.4
63
Natural Language UnderstandingSuperGLUE
CB Accuracy94.5
32
Natural Language UnderstandingSuperGLUE
MultiRC Score75.9
22
Natural Language UnderstandingSuperGLUE
SST-2 Accuracy96
18
Natural Language UnderstandingSuperGLUE RoBERTa-large (test)
ReCoRD89.21
17
Natural Language UnderstandingSuperGLUE few-shot
BoolQ Accuracy0.818
16
Natural Language UnderstandingSuperGLUE 1,000 examples
BoolQ Accuracy84
15
Natural Language UnderstandingSuperGLUE
WSC Score57.69
13
Natural Language ProcessingSuperGLUE Full, excl. ReCoRD (dev)
Macro Avg Score70.03
13
Natural Language ProcessingSuperGLUE 1k samples, excl. ReCoRD (dev)
Macro Avg Score65.84
13
Natural Language ProcessingSuperGLUE 100 samples, excl. ReCoRD (dev)
Macro Avg Score59.88
13
Natural Language UnderstandingSuperGLUE Zero-shot
BoolQ Accuracy88
11
Natural Language UnderstandingSuperGLUE 1,000 examples (test)
BoolQ86.7
10
Text ClassificationSuperGLUE (val)
Average Validation Score89.2
10
NLU and Question AnsweringSuperGLUE
SST-2 Accuracy94.7
9
Failure DiagnosisSuperGLUE
Macro Similarity Score36
8
Natural Language UnderstandingSuperGLUE v1 (test)
BoolQ Acc91.3
7
Natural Language UnderstandingSuperGLUE
Accuracy (SST2)94.81
6
ClassificationSuperGLUE
RTE Score72.2
6
Automated ProbingSuperGLUE
Error Rate38
3
Showing 22 of 22 rows