| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SNLI (test) | UnitedSynT5 | Accuracy94.7 | 690 | 29d ago | |
| RTE | Enumerate | Accuracy93.5 | 448 | 5d ago | |
| SNLI | NHP Context-Free Data Aug. | Accuracy100 | 180 | 25d ago | |
| XNLI (test) | Ours | Average Accuracy90 | 167 | 4d ago | |
| SNLI (train) | Lexicalized classifier | Accuracy99.7 | 154 | 1mo ago | |
| CB | FADS-ICL | Accuracy98.2 | 118 | 29d ago | |
| XNLI | mT5 | Accuracy87.1 | 111 | 29d ago | |
| MNLI (matched) | DeBERTaxLarge | Accuracy91.7 | 110 | 1mo ago | |
| MedNLI (test) | T5 | Accuracy86.57 | 89 | 1mo ago | |
| SciTail (test) | ALUM_ROBERTA-LARGE-SMART | Accuracy96.8 | 86 | 1mo ago | |
| MultiNLI (test) | Naive + P | Average Worst-Group Accuracy88.05 | 81 | 3d ago | |
| MNLI | Accuracy (matched)90.8 | 80 | 11d ago | ||
| SNLI (dev) | ALUM_ROBERTA-LARGE-SMART | Accuracy93.6 | 71 | 1mo ago | |
| MNLI (mismatched) | METALM | Accuracy91 | 68 | 1mo ago | |
| ANLI | Zero-shot-EI | Accuracy74.02 | 65 | 5d ago | |
| MultiNLI matched (test) | BERT-base + PoE | Accuracy85.38 | 65 | 1mo ago | |
| ANLI Round 3 | LMSI | Accuracy67.9 | 64 | 1mo ago | |
| ANLI Round 2 | LMSI | Accuracy66.5 | 64 | 1mo ago | |
| QNLI | UD+-XXL | Accuracy96.56 | 61 | 11d ago | |
| MultiNLI Mismatched | Densely Interactive Inference Network | Accuracy79.1 | 60 | 1mo ago | |
| ANLI Round 1 | FLAN-T5 | Accuracy77 | 57 | 1mo ago | |
| BioNLI | Accuracy (Chinese)72.46 | 56 | 1mo ago | ||
| MultiNLI mismatched (test) | Accuracy81.4 | 56 | 1mo ago | ||
| HANS (test) | Roberta-large w/ Z-Aug | Accuracy78.65 | 54 | 1mo ago | |
| RTE (test) | HiddenKey | Accuracy90.25 | 52 | 1mo ago |