| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SNLI (test) | UnitedSynT5 | Accuracy94.7 | 681 | 4d ago | |
| RTE | LAT | Accuracy90 | 367 | 2d ago | |
| SNLI | NHP Context-Free Data Aug. | Accuracy100 | 174 | 2d ago | |
| XNLI (test) | Ours | Average Accuracy90 | 167 | 2d ago | |
| SNLI (train) | Lexicalized classifier | Accuracy99.7 | 154 | 4d ago | |
| XNLI | mT5 | Accuracy87.1 | 111 | 2d ago | |
| CB | FADS-ICL | Accuracy98.2 | 110 | 4d ago | |
| MNLI (matched) | DeBERTaxLarge | Accuracy91.7 | 110 | 4d ago | |
| MedNLI (test) | T5 | Accuracy86.57 | 89 | 4d ago | |
| SciTail (test) | ALUM_ROBERTA-LARGE-SMART | Accuracy96.8 | 86 | 4d ago | |
| MNLI | Accuracy (matched)90.8 | 80 | 4d ago | ||
| SNLI (dev) | ALUM_ROBERTA-LARGE-SMART | Accuracy93.6 | 71 | 3d ago | |
| MNLI (mismatched) | METALM | Accuracy91 | 68 | 4d ago | |
| MultiNLI matched (test) | BERT-base + PoE | Accuracy85.38 | 65 | 4d ago | |
| ANLI Round 3 | LMSI | Accuracy67.9 | 64 | 4d ago | |
| ANLI Round 2 | LMSI | Accuracy66.5 | 64 | 4d ago | |
| MultiNLI Mismatched | Densely Interactive Inference Network | Accuracy79.1 | 60 | 4d ago | |
| ANLI Round 1 | FLAN-T5 | Accuracy77 | 57 | 4d ago | |
| BioNLI | Accuracy (Chinese)72.46 | 56 | 4d ago | ||
| MultiNLI mismatched (test) | Accuracy81.4 | 56 | 4d ago | ||
| HANS (test) | Roberta-large w/ Z-Aug | Accuracy78.65 | 54 | 4d ago | |
| RTE (test) | HiddenKey | Accuracy90.25 | 52 | 4d ago | |
| MultiNLI Matched | CAFE Ensemble | Accuracy80.2 | 49 | 4d ago | |
| E-SNLI | ColD-Fusion | Accuracy91.31 | 46 | 2d ago | |
| MNLI (dev) | RoBERTa | Acc (m)90.2 | 44 | 4d ago |