| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | QNLI | Accuracy96.56 | 42 | |
| Natural Language Inference | QNLI (test) | Accuracy93.3 | 27 | |
| Natural Language Inference | QNLI 64 instances (test) | Accuracy91.1 | 20 | |
| Sentence-pair classification | QNLI | Accuracy92.45 | 20 | |
| Natural Language Inference | QNLI few-shot zero-shot | Accuracy71.6 | 16 | |
| Text Classification | QNLI | Accuracy (%)92.79 | 15 | |
| Text Classification | QNLI (test) | Accuracy (Clean)92.8 | 14 | |
| Embedding Inversion | QNLI (test) | ROUGE-L0.2226 | 12 | |
| Natural Language Inference | QNLI (val) | Accuracy88.05 | 11 | |
| Ranking correlation with full dataset evaluation | QNLI | Kendall Correlation0.91 | 10 | |
| Natural Language Inference | QNLI (test) | SEAT e-size (Names: Career/Family)0.01 | 8 | |
| Bias Mitigation | QNLI | Accuracy85.39 | 8 | |
| Backdoor Defense | QNLI | Accuracy85.46 | 8 | |
| Natural Language Inference | QNLI standard (test dev) | SAcc92.1 | 6 | |
| Natural Language Inference | QNLI (dev) | Accuracy0.945 | 6 | |
| Question-Answer Entailment | QNLI (val) | AUC77.09 | 6 | |
| Natural Language Inference | QNLI | Total Running Time (s)3,011 | 5 | |
| Generalization gap prediction | QNLI Case 9 | Gap Prediction Error0.18 | 5 | |
| Generalization gap prediction | QNLI | Generalization Gap Error0.14 | 5 | |
| Question Answering NLI | QNLI GLUE (test) | Accuracy0.903 | 5 | |
| Binary Classification | QNLI | AUC77.11 | 5 | |
| Natural Language Inference | QNLI Non-Biased | Accuracy90.9 | 4 | |
| Natural Language Inference | QNLI Biased | Accuracy94.3 | 4 | |
| Question Answering | QNLI | Accuracy92.66 | 4 | |
| Natural Language Inference | QNLI (train) | Training Throughput13.6 | 4 |