| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | SNLI (test) | Accuracy94.7 | 681 | |
| Natural Language Inference | SNLI | Accuracy100 | 174 | |
| Natural Language Inference | SNLI (train) | Accuracy99.7 | 154 | |
| Natural Language Inference | SNLI (dev) | Accuracy93.6 | 71 | |
| Counterfactual Generation | SNLI Hypothesis | LFR83 | 37 | |
| Counterfactual Generation | SNLI Premise | LFR0.759 | 37 | |
| Natural Language Inference | SNLI hard 1.0 (test) | Accuracy84.48 | 27 | |
| Explanation Faithfulness | SNLI | Delta AF0.989 | 24 | |
| Masked Language Modeling | SNLI (randomly sampled) | PPL (U)8.57 | 20 | |
| Natural Language Inference | SNLI 1.0 (test) | Accuracy90.67 | 19 | |
| Explanation Evaluation | SNLI (test) | Sufficiency43.76 | 16 | |
| Membership Inference Attack | SNLI | ROC AUC99.8 | 12 | |
| Natural Language Inference | SNLI source: MNLI (test) | Accuracy80.2 | 12 | |
| Ranking correlation with full dataset evaluation | SNLI | Kendall Correlation0.93 | 10 | |
| Identifying plausible explanations | δ-SNLI | Accuracy81.6 | 9 | |
| Natural Language Inference | SNLI 1.0 (train) | Accuracy93.1 | 9 | |
| Natural Language Inference | SNLI Counterfactual | Accuracy59.9 | 8 | |
| Natural Language Inference | SNLI In-Domain (test) | Accuracy91.68 | 8 | |
| Natural Language Inference | adv-SNLI TextFooler-RoBERTa | Accuracy52.6 | 8 | |
| Natural Language Inference | adv-SNLI TextFooler-BERT | Accuracy62.3 | 8 | |
| Ordinal Classification | SNLI standard (test) | F1 Score89.1 | 7 | |
| Text Classification | SNLI | Accuracy88.2 | 6 | |
| Natural Language Inference | SNLI 3-Choice | ΔAcc11.7 | 6 | |
| Counterfactual Faithfulness | SNLI | Faithfulness Score0.243 | 6 | |
| Redaction Faithfulness | SNLI | Faithfulness Score0.355 | 6 |