Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

STS-B

Benchmarks

Task NameDataset NameSOTA ResultTrend
Semantic Textual SimilaritySTS-B
Spearman's Rho (x100)92.9
70
Semantic SimilaritySTS-B (test)
Semantic Consistency80.22
18
Imbalanced RegressionSTS-B-DIR Few-shot (test)
MSE0.781
14
Imbalanced RegressionSTS-B-DIR Medium-shot (test)
MSE0.899
14
Imbalanced RegressionSTS-B-DIR Many-shot (test)
MSE0.795
14
Imbalanced RegressionSTS-B-DIR All (test)
MSE0.892
14
RegressionSTS-B-DIR Few-shot
MSE0.781
14
RegressionSTS-B-DIR Medium-shot
MSE0.899
14
RegressionSTS-B-DIR Many-shot
MSE0.795
14
RegressionSTS-B DIR (All)
MSE0.892
14
Semantic Textual SimilaritySTS-B
Accuracy0.595
10
Semantic Textual SimilarityMultilingual STS-B (val)
Spearman Correlation77.48
8
Semantic Textual SimilaritySTS-B (dev)
Pearson Correlation0.918
6
Text Similarity RegressionSTS-B DIR (test)
MSE (All)0.877
6
Uncertainty EstimationSTS-B DIR Few
NLL2.152
5
Uncertainty EstimationSTS-B-DIR Medium
NLL2.754
5
Uncertainty EstimationSTS-B-DIR Many
NLL1.81
5
Uncertainty EstimationSTS-B DIR (All)
NLL1.996
5
RegressionSTS-B (test)
Spearman Corr (%)88.94
5
Intrinsic Bias EvaluationSTS-B
StereoSet Score54.53
3
Sentence RankingSTS-B
KCC63.64
3
Sentence RetrievalSTS-B (test)
Recall@178.87
2
Rank correlation between domain similarity and representation similarityMulti STS-B May, 2021 (test)
Spearman's rho0.924
2
Showing 23 of 23 rows