| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text Classification | SST-2 (test) | Accuracy98 | 185 | |
| Prediction-grounded correlation with output difference (JSD) | SST-2 | Spearman Correlation0.01 | 145 | |
| Sentiment Analysis | SST-2 (test) | Accuracy97.1 | 144 | |
| Sentiment Classification | SST-2 64 instances (test) | Accuracy92.55 | 80 | |
| Backdoor Defense | SST-2 | CACC91.71 | 65 | |
| Interpretation | SST-2 | L2 Norm0.0434 | 56 | |
| Prediction-grounded correlation with accuracy difference | SST-2 | Spearman Correlation0.58 | 54 | |
| Text Classification | SST-2 | Accuracy94.8 | 54 | |
| Sentiment Analysis | SST-2 (test) | Clean Accuracy96.43 | 50 | |
| Sentiment Analysis | SST-2 GLUE | F1 Score94.9 | 45 | |
| Sentiment Analysis | SST-2 (dev) | Accuracy96.8 | 41 | |
| Binary Classification | SST-2 (test) | Accuracy94.63 | 32 | |
| Language similarity grounding | SST-2 | Accuracy Correlation0.43 | 31 | |
| Sentiment Analysis | SST-2 | Accuracy96.9 | 31 | |
| Text Classification | SST-2 | Accuracy93.62 | 24 | |
| Faithfulness Evaluation | SST-2 (test) | Rate of Label Changes5.5 | 24 | |
| Sentiment Classification | SST-2 | Delta Accuracy0.05 | 24 | |
| Sentiment Analysis | SST-2 (test) | Avg Accuracy86.7 | 24 | |
| Sentiment Analysis | SST-2 | CACC96.7 | 20 | |
| Text Classification | SST-2 | CA95.06 | 20 | |
| Text Classification | SST-2 (test) | Delta CACC1.57 | 18 | |
| Backdoor Trigger Detection | SST-2 | AU-ROC98.77 | 16 | |
| Sentiment Analysis | SST-2 (test) | CACC (Badnet)95.55 | 15 | |
| Sentiment Analysis | SST-2 (held-out) | F1 Score39.8 | 14 | |
| Text Clustering | SST-2 (test) | Accuracy90.2 | 14 |