| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text Classification | SST-2 (test) | Accuracy98 | 185 | |
| Sentiment Analysis | SST-2 (test) | Accuracy97.1 | 136 | |
| Sentiment Classification | SST-2 64 instances (test) | Accuracy92.55 | 80 | |
| Backdoor Defense | SST-2 | CACC91.71 | 65 | |
| Interpretation | SST-2 | L2 Norm0.0434 | 56 | |
| Sentiment Analysis | SST-2 (test) | Clean Accuracy96.43 | 50 | |
| Sentiment Analysis | SST-2 GLUE | F1 Score94.9 | 45 | |
| Sentiment Analysis | SST-2 (dev) | Accuracy96.8 | 41 | |
| Sentiment Analysis | SST-2 | Accuracy96.9 | 31 | |
| Text Classification | SST-2 | Accuracy93.62 | 24 | |
| Faithfulness Evaluation | SST-2 (test) | Rate of Label Changes5.5 | 24 | |
| Sentiment Classification | SST-2 | Delta Accuracy0.05 | 24 | |
| Sentiment Analysis | SST-2 (test) | Avg Accuracy86.7 | 24 | |
| Sentiment Analysis | SST-2 | CACC96.7 | 20 | |
| Text Classification | SST-2 | CA95.06 | 20 | |
| Text Classification | SST-2 (test) | Delta CACC1.57 | 18 | |
| Backdoor Trigger Detection | SST-2 | AU-ROC98.77 | 16 | |
| Sentiment Analysis | SST-2 (test) | CACC (Badnet)95.55 | 15 | |
| Sentiment Analysis | SST-2 (held-out) | F1 Score39.8 | 14 | |
| Text Clustering | SST-2 (test) | Accuracy90.2 | 14 | |
| Explanation Evaluation | SST-2 (test) | Sufficiency17.69 | 14 | |
| Sentiment Analysis | SST-2 (test) | Top-1 Accuracy92.66 | 12 | |
| Backdoor Purification | SST-2 | CACC89.84 | 12 | |
| Sentiment Analysis | SST-2 (test) | Attack Success Rate100 | 12 | |
| Sentiment Analysis | SST-2 original (test) | Accuracy95.9 | 11 |