| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | ANLI Round 3 | Accuracy67.9 | 64 | |
| Natural Language Inference | ANLI Round 2 | Accuracy66.5 | 64 | |
| Natural Language Inference | ANLI Round 1 | Accuracy77 | 57 | |
| Abductive Natural Language Inference | aNLI (leaderboard) | Accuracy93.2 | 47 | |
| Natural Language Inference | ANLI R3 1.0 (test) | Weighted F134.9 | 28 | |
| Natural Language Inference | ANLI R2 1.0 (test) | Weighted F10.331 | 28 | |
| Natural Language Inference | ANLI R1 1.0 (test) | Weighted F141.1 | 28 | |
| Commonsense Reasoning | aNLI | Accuracy87.3 | 28 | |
| Natural Language Inference | ANLI (test) | Overall Score92.2 | 28 | |
| Natural Language Inference | ANLI | ANLI R1 Accuracy73.1 | 27 | |
| Natural Language Inference | ANLI R3 (test) | Accuracy44.7 | 26 | |
| Natural Language Inference | ANLI R1 (test) | Accuracy44.3 | 26 | |
| Natural Language Inference | ANLI R2 | Accuracy81.14 | 24 | |
| Abductive Commonsense Reasoning | aNLI (test) | Accuracy92.9 | 23 | |
| Natural Language Inference | ANLI (val) | Accuracy73.37 | 21 | |
| Natural Language Inference | ANLI R2 (test) | Accuracy33.1 | 20 | |
| Natural Language Inference | ANLI | Accuracy52.59 | 18 | |
| Natural Language Inference | ANLI Round 2 (test) | Accuracy51.4 | 14 | |
| Natural Language Inference | ANLI (dev) | R1 Score76.4 | 13 | |
| Natural Language Inference | ANLI R3 | Accuracy46.67 | 8 | |
| Natural Language Inference | ANLI R1 | Accuracy47.3 | 8 | |
| Natural Language Inference | ANLI MNLI + SNLI trained (test) | ANLI A1 Score50 | 8 | |
| Natural Language Inference | ANLI MNLI + SNLI trained (dev) | Accuracy (A1)50.4 | 8 | |
| Natural Language Inference | ANLI R3 | Accuracy67.1 | 7 | |
| Natural Language Inference | ANLI R1 | Accuracy78.5 | 7 |