| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | MultiNLI matched (test) | Accuracy85.38 | 65 | |
| Natural Language Inference | MultiNLI Mismatched | Accuracy79.1 | 60 | |
| Natural Language Inference | MultiNLI mismatched (test) | Accuracy81.4 | 56 | |
| Natural Language Inference | MultiNLI Matched | Accuracy80.2 | 49 | |
| Natural Language Inference | MultiNLI mismatched (cross-domain) RepEval 2017 (test) | Accuracy75.8 | 25 | |
| Natural Language Inference | MultiNLI matched (dev) | Accuracy88.4 | 23 | |
| Natural Language Inference | MultiNLI (test) | Accuracy83.7 | 21 | |
| Text Classification | MultiNLI (test) | WGA81.3 | 18 | |
| Natural Language Inference | MultiNLI matched (in-domain) RepEval 2017 (test) | Accuracy76.8 | 18 | |
| Confidence Calibration | MultiNLI Mismatch (test) | ECE0.0071 | 16 | |
| Natural Language Understanding | MultiNLI (Match) | ECE1.02 | 16 | |
| Natural Language Inference | MultiNLI mismatched (dev) | Accuracy88.4 | 11 | |
| Natural Language Inference | MultiNLI matched/mismatched | Accuracy92.6 | 10 | |
| Natural Language Inference | MultiNLI matched (in-domain) | Accuracy74.6 | 8 | |
| Natural Language Inference | MultiNLI matched (val) | Accuracy91.7 | 8 | |
| Natural Language Inference | MultiNLI WILDS (test) | IID Accuracy82.1 | 6 | |
| Natural Language Inference | MultiNLI (val) | Accuracy73.17 | 5 | |
| Natural Language Inference | MultiNLI | Accuracy82.4 | 3 |