| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | SNLI (test) | Accuracy94.7 | 690 | |
| Natural Language Inference | SNLI | Accuracy100 | 180 | |
| Natural Language Inference | SNLI (train) | Accuracy99.7 | 154 | |
| Natural Language Inference | SNLI (dev) | Accuracy93.6 | 71 | |
| Counterfactual Generation | SNLI Hypothesis | LFR83 | 37 | |
| Counterfactual Generation | SNLI Premise | LFR0.759 | 37 | |
| Natural Language Inference | SNLI hard 1.0 (test) | Accuracy84.48 | 27 | |
| Explanation Faithfulness | SNLI | Delta AF0.989 | 24 | |
| Masked Language Modeling | SNLI (randomly sampled) | PPL (U)8.57 | 20 | |
| Natural Language Inference | SNLI 1.0 (test) | Accuracy90.67 | 19 | |
| Explanation Evaluation | SNLI (test) | Sufficiency43.76 | 16 | |
| Membership Inference Attack | SNLI | ROC AUC99.8 | 12 | |
| Natural Language Inference | SNLI source: MNLI (test) | Accuracy80.2 | 12 | |
| Ranking correlation with full dataset evaluation | SNLI | Kendall Correlation0.93 | 10 | |
| Identifying plausible explanations | δ-SNLI | Accuracy81.6 | 9 | |
| Natural Language Inference | SNLI 1.0 (train) | Accuracy93.1 | 9 | |
| Textual Entailment | SNLI | ASR22.1 | 8 | |
| Natural Language Inference | SNLI | Fine-tuning Rounds255 | 8 | |
| Natural Language Inference | SNLI | VRAM131.39 | 8 | |
| Natural Language Inference | SNLI Counterfactual | Accuracy59.9 | 8 | |
| Natural Language Inference | SNLI In-Domain (test) | Accuracy91.68 | 8 | |
| Natural Language Inference | adv-SNLI TextFooler-RoBERTa | Accuracy52.6 | 8 | |
| Natural Language Inference | adv-SNLI TextFooler-BERT | Accuracy62.3 | 8 | |
| Ordinal Classification | SNLI standard (test) | F1 Score89.1 | 7 | |
| Text Classification | SNLI | Accuracy88.2 | 6 |