| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Semantic Relatedness | SICK 2014 (test) | Pearson's r0.884 | 56 | |
| Semantic Textual Similarity | SICK Slovak (val) | Pearson Correlation0.778 | 33 | |
| Outlier Detection | Sick | AP (%)36.3 | 22 | |
| Classification | Sick (test) | Accuracy98.94 | 21 | |
| Textual Entailment | SICK (test) | Accuracy90.3 | 21 | |
| Sentence Relatedness | SICK (test + train) | Spearman Correlation0.61 | 21 | |
| Classification | Sick | F1 Score88.71 | 15 | |
| Natural Language Inference | SICK | Accuracy88.1 | 15 | |
| Natural Language Entailment | SICK-E | Spearman Rho (x100)71.26 | 12 | |
| Semantic Textual Similarity | SICK (test) | Spearman Correlation0.7669 | 12 | |
| Semantic Relatedness | SICK | Pearson r0.868 | 12 | |
| Outlier Detection | Sick | AUC0.918 | 11 | |
| Outlier Detection | Sick | AUC-ROC90.2 | 11 | |
| Outlier Detection | Sick | AUC-PR0.355 | 11 | |
| Semantic Textual Similarity | SICK-R | Spearman Rho (x100)65.44 | 11 | |
| Sentence Relatedness | SICK (test) | Pearson Correlation (r)0.8695 | 7 | |
| Semantic Similarity | SICK-R (test) | Semantic Consistency (spring)67.03 | 5 | |
| Natural Language Inference Explanation Evaluation | SICK (sample) | Average Score95.63 | 4 | |
| Semantic Textual Similarity | SICK | Pearson Correlation0.915 | 4 | |
| Semantic Relatedness | SICK (test) | MSE0.233 | 4 | |
| Sentence Ranking | SICK-R | KCC57.4 | 3 | |
| Semantic Relatedness | SICK filtered 2014 (test) | RMSE0.24 | 3 |