| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text Classification | SST-2 (test) | Accuracy98 | 185 | |
| Sentiment Analysis | SST-2 (test) | Accuracy97.1 | 136 | |
| Sentiment Classification | SST-2 64 instances (test) | Accuracy92.55 | 80 | |
| Interpretation | SST-2 | L2 Norm0.0434 | 56 | |
| Sentiment Analysis | SST-2 (test) | Clean Accuracy96.43 | 50 | |
| Sentiment Analysis | SST-2 GLUE | F1 Score94.9 | 45 | |
| Backdoor Defense | SST-2 | CACC91.71 | 41 | |
| Sentiment Analysis | SST-2 (dev) | Accuracy96.8 | 41 | |
| Sentiment Analysis | SST-2 | Accuracy96.9 | 31 | |
| Sentiment Classification | SST-2 | Delta Accuracy0.05 | 24 | |
| Sentiment Analysis | SST-2 (test) | Avg Accuracy86.7 | 24 | |
| Text Classification | SST-2 | CA95.06 | 20 | |
| Text Classification | SST-2 (test) | Delta CACC1.57 | 18 | |
| Backdoor Trigger Detection | SST-2 | AU-ROC98.77 | 16 | |
| Sentiment Analysis | SST-2 (test) | CACC (Badnet)95.55 | 15 | |
| Sentiment Analysis | SST-2 (held-out) | F1 Score39.8 | 14 | |
| Text Clustering | SST-2 (test) | Accuracy90.2 | 14 | |
| Explanation Evaluation | SST-2 (test) | Sufficiency17.69 | 14 | |
| Sentiment Analysis | SST-2 (test) | Top-1 Accuracy92.66 | 12 | |
| Backdoor Purification | SST-2 | CACC89.84 | 12 | |
| Sentiment Analysis | SST-2 (test) | Attack Success Rate100 | 12 | |
| Sentiment Analysis | SST-2 original (test) | Accuracy95.9 | 11 | |
| Sentiment Classification | SST-2 32 samples | Accuracy94.4 | 11 | |
| Backdoor Trigger Detection | SST-2 (test) | Precision1 | 10 | |
| Text Classification | SST-2 FTC-metadataset mini (10%) (full dataset 100%) | NLL0.1309 | 8 |