| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Paraphrase Identification | QQP | Accuracy91.7 | 78 | |
| Paraphrase Detection | QQP (test) | Accuracy95.3 | 51 | |
| Sentence-pair classification | QQP | Accuracy88.8 | 40 | |
| Paraphrasing | QQP | BLEU33 | 22 | |
| Paraphrase Generation | QQP (test) | BLEU-241.74 | 22 | |
| Paraphrase Identification | QQP Out-of-distribution from PAWS | Macro F170.8 | 20 | |
| Seq2Seq | QQP | ROUGE-L66 | 18 | |
| Seq2Seq generation | QQP | BLEU0.3142 | 17 | |
| Text Classification | QQP | RDC0 | 16 | |
| Paraphrase Identification | QQP few-shot zero-shot | Accuracy74 | 16 | |
| Paraphrase Detection | QQP source: RTE (test) | Accuracy71.5 | 12 | |
| Paraphrasing | QQP | Semantic Faithfulness90.26 | 11 | |
| Paraphrase Detection | QQP | F1 Score89 | 10 | |
| Paraphrase Identification | QQP Out-of-distribution from PIT | Macro F10.757 | 10 | |
| Paraphrase Identification | QQP -> WMT (test) | AUROC85.1 | 10 | |
| Ranking correlation with full dataset evaluation | QQP | Kendall Correlation0.95 | 10 | |
| Paraphrase Detection | QQP | Accuracy79.2 | 9 | |
| Classification | QQP | ASR20 | 8 | |
| Paraphrase Detection | QQP IID | Accuracy84.8 | 8 | |
| Bias Mitigation | QQP | Accuracy80.1 | 8 | |
| Backdoor Defense | QQP | Accuracy80.76 | 8 | |
| Paraphrase Detection | QQP | Average Accuracy71.2 | 8 | |
| Paraphrase Detection | QQP In-Domain (test) | Accuracy91.66 | 8 | |
| Paraphrase Detection | QQP (dev) | Accuracy92.7 | 6 | |
| Paraphrase Detection | QQP | Total Running Time (s)8,279 | 5 |