| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Inference | SICK | Accuracy91.58 | 85 | |
| Sentence Similarity | SICK | Spearman Correlation70.82 | 56 | |
| Semantic Relatedness | SICK 2014 (test) | Pearson's r0.884 | 56 | |
| Semantic Textual Similarity | SICK Slovak (val) | Pearson Correlation0.778 | 33 | |
| Semantic Textual Similarity | SICK-R (test) | Similarity Score60.74 | 30 | |
| Outlier Detection | Sick | AP (%)36.3 | 22 | |
| Semantic Similarity | SICK | Accuracy652.1 | 21 | |
| Classification | Sick (test) | Accuracy98.94 | 21 | |
| Textual Entailment | SICK (test) | Accuracy90.3 | 21 | |
| Sentence Relatedness | SICK (test + train) | Spearman Correlation0.61 | 21 | |
| Semantic Textual Similarity | SICK-R | Spearman Rho (x100)72.56 | 16 | |
| Classification | Sick | F1 Score88.71 | 15 | |
| Natural Language Entailment | SICK-E | Spearman Rho (x100)71.26 | 12 | |
| Semantic Textual Similarity | SICK (test) | Spearman Correlation0.7669 | 12 | |
| Semantic Relatedness | SICK | Pearson r0.868 | 12 | |
| Outlier Detection | Sick | AUC0.918 | 11 | |
| Outlier Detection | Sick | AUC-ROC90.2 | 11 | |
| Outlier Detection | Sick | AUC-PR0.355 | 11 | |
| Imbalanced Classification | Sick | Macro F189.63 | 8 | |
| Sentence Relatedness | SICK (test) | Pearson Correlation (r)0.8695 | 7 | |
| Semantic Similarity | SICK-R (test) | Semantic Consistency (spring)67.03 | 5 | |
| Natural Language Inference Explanation Evaluation | SICK (sample) | Average Score95.63 | 4 | |
| Semantic Textual Similarity | SICK | Pearson Correlation0.915 | 4 | |
| Semantic Relatedness | SICK (test) | MSE0.233 | 4 | |
| Regression | SICK-R | Spearman Correlation86.54 | 3 |