| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Semantic Hallucination Detection | PAWS | AUROC91.24 | 36 | |
| Paraphrase Detection | PAWS | Accuracy94.6 | 24 | |
| Paraphrase Detection | PAWS original (test) | Accuracy91.79 | 23 | |
| Paraphrase Identification | PAWS -> PAWS (test) | Accuracy94.1 | 22 | |
| Paraphrase Identification | PAWS -> QQP (test) | AUROC77.7 | 20 | |
| Paraphrase Identification | PAWS | Accuracy97.6 | 17 | |
| Paraphrase Detection | PAWS-QQP | Accuracy96 | 16 | |
| Paraphrase Identification | PAWS Out-of-distribution from PIT | Macro F155.1 | 10 | |
| Paraphrase Identification | PAWS -> WMT (test) | AUROC0.849 | 10 | |
| Ranking correlation with full dataset evaluation | PAWS Wiki | Kendall Correlation0.96 | 10 | |
| Performance Prediction | PAWS | MAE0.8 | 9 | |
| Zero-shot performance prediction | PAWS | MAE1.92 | 9 | |
| Paraphrase Detection | PAWS Wiki | Accuracy47.5 | 8 | |
| Ranking correlation with full dataset evaluation | PAWS QQP | Kendall Correlation0.87 | 7 |