Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

STS-B

Benchmarks

Task NameDataset NameSOTA ResultTrend
Semantic Textual SimilaritySTS-B
Spearman's Rho (x100)92.9
156
Semantic SimilaritySTS-B (test)
Semantic Consistency80.22
18
Imbalanced RegressionSTS-B-DIR Few-shot (test)
MSE0.781
14
Imbalanced RegressionSTS-B-DIR Medium-shot (test)
MSE0.899
14
Imbalanced RegressionSTS-B-DIR Many-shot (test)
MSE0.795
14
Imbalanced RegressionSTS-B-DIR All (test)
MSE0.892
14
RegressionSTS-B-DIR Few-shot
MSE0.781
14
RegressionSTS-B-DIR Medium-shot
MSE0.899
14
RegressionSTS-B-DIR Many-shot
MSE0.795
14
RegressionSTS-B DIR (All)
MSE0.892
14
Semantic Textual SimilaritySTS-B (test)
PICP95.94
12
Semantic Textual SimilaritySTS-B
Accuracy0.595
10
Semantic Textual SimilaritySTS-B
CCC61.08
9
Semantic Textual SimilarityMultilingual STS-B (val)
Spearman Correlation77.48
8
Semantic Textual SimilaritySTS-B (dev)
Pearson Correlation0.918
6
Text Similarity RegressionSTS-B DIR (test)
MSE (All)0.877
6
Uncertainty EstimationSTS-B DIR Few
NLL2.152
5
Uncertainty EstimationSTS-B-DIR Medium
NLL2.754
5
Uncertainty EstimationSTS-B-DIR Many
NLL1.81
5
Uncertainty EstimationSTS-B DIR (All)
NLL1.996
5
RegressionSTS-B (test)
Spearman Corr (%)88.94
5
Semantic Textual SimilaritySTS-B
Latency (ms)22.63
4
Intrinsic Bias EvaluationSTS-B
StereoSet Score54.53
3
Sentence RankingSTS-B
KCC63.64
3
Sentence RetrievalSTS-B (test)
Recall@178.87
2
Showing 25 of 26 rows