Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MultiNLI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language InferenceMultiNLI (test)
Average Worst-Group Accuracy88.05
81
Natural Language InferenceMultiNLI matched (test)
Accuracy85.38
65
Natural Language InferenceMultiNLI Mismatched
Accuracy79.1
60
Natural Language InferenceMultiNLI mismatched (test)
Accuracy81.4
56
Natural Language InferenceMultiNLI Matched
Accuracy80.2
49
Natural Language InferenceMultiNLI mismatched (cross-domain) RepEval 2017 (test)
Accuracy75.8
25
Natural Language InferenceMultiNLI
Accuracy82.4
23
Natural Language InferenceMultiNLI matched (dev)
Accuracy88.4
23
Text ClassificationMultiNLI (test)
WGA81.3
18
Natural Language InferenceMultiNLI matched (in-domain) RepEval 2017 (test)
Accuracy76.8
18
Confidence CalibrationMultiNLI Mismatch (test)
ECE0.0071
16
Natural Language UnderstandingMultiNLI (Match)
ECE1.02
16
Natural Language InferenceMultiNLI mismatched (dev)
Accuracy88.4
11
Natural Language InferenceMultiNLI matched/mismatched
Accuracy92.6
10
Natural Language InferenceMultiNLI matched (in-domain)
Accuracy74.6
8
Natural Language InferenceMultiNLI matched (val)
Accuracy91.7
8
Text ClassificationMultiNLI
Average Accuracy81.1
7
Natural Language InferenceMultiNLI WILDS (test)
IID Accuracy82.1
6
Natural Language InferenceMultiNLI reconstructed with controlled shortcut injection (test)
MSTPS0.797
5
Natural Language InferenceMultiNLI controlled shortcut injection
Accuracy32.3
5
Natural Language InferenceMultiNLI (val)
Accuracy73.17
5
Showing 21 of 21 rows