Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SST-2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text ClassificationSST-2 (test)
Accuracy98
185
Prediction-grounded correlation with output difference (JSD)SST-2
Spearman Correlation0.01
145
Sentiment AnalysisSST-2 (test)
Accuracy97.1
144
Sentiment ClassificationSST-2 64 instances (test)
Accuracy92.55
80
Backdoor DefenseSST-2
CACC91.71
65
InterpretationSST-2
L2 Norm0.0434
56
Prediction-grounded correlation with accuracy differenceSST-2
Spearman Correlation0.58
54
Text ClassificationSST-2
Accuracy94.8
54
Sentiment AnalysisSST-2 (test)
Clean Accuracy96.43
50
Sentiment AnalysisSST-2 GLUE
F1 Score94.9
45
Sentiment AnalysisSST-2 (dev)
Accuracy96.8
41
Binary ClassificationSST-2 (test)
Accuracy94.63
32
Language similarity groundingSST-2
Accuracy Correlation0.43
31
Sentiment AnalysisSST-2
Accuracy96.9
31
Text ClassificationSST-2
Accuracy93.62
24
Faithfulness EvaluationSST-2 (test)
Rate of Label Changes5.5
24
Sentiment ClassificationSST-2
Delta Accuracy0.05
24
Sentiment AnalysisSST-2 (test)
Avg Accuracy86.7
24
Sentiment AnalysisSST-2
CACC96.7
20
Text ClassificationSST-2
CA95.06
20
Text ClassificationSST-2 (test)
Delta CACC1.57
18
Backdoor Trigger DetectionSST-2
AU-ROC98.77
16
Sentiment AnalysisSST-2 (test)
CACC (Badnet)95.55
15
Sentiment AnalysisSST-2 (held-out)
F1 Score39.8
14
Text ClusteringSST-2 (test)
Accuracy90.2
14
Showing 25 of 83 rows