| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Counterfactual Generation | SST2 (test) | SLFR29 | 29 | |
| Image Classification | SST2 Rendered | Top-1 Accuracy68.4 | 23 | |
| Sentiment Analysis | SST2 | Spearman Rho (x100)93.32 | 23 | |
| Sentiment Classification | SST2 | Deletion Robustness0.2943 | 20 | |
| Sentiment Analysis | SST2 | Accuracy94.04 | 20 | |
| Feature Attribution | SST2 | LO-0.199 | 18 | |
| Out-of-Distribution Detection | SST2 (test) | AUROC0.7327 | 17 | |
| Sentiment Classification | SST2 phrase | Accuracy93.96 | 16 | |
| Sentiment Analysis | SST2 (test) | HS Score54.4 | 14 | |
| Graph Classification | SST2 GraphOOD (test) | Accuracy83.52 | 13 | |
| Text Classification | SST2 | Accuracy93.42 | 10 | |
| Graph Classification | GRAPH-SST2 (test) | Accuracy82.99 | 8 | |
| Classification | SST2 Dir alpha=0.1 | Generalized Accuracy92.14 | 6 | |
| Sentiment Analysis | SST2 Dir alpha=0.1 Standard | Personalized Accuracy (Acc_p)95.9 | 6 | |
| Binary Classification | SST2 32-shot (test) | Accuracy75.5 | 5 | |
| Binary Classification | SST2 16-shot (test) | Accuracy73.2 | 5 | |
| Binary Classification | SST2 4-shot (test) | Accuracy0.698 | 5 | |
| Watermarking | SST2 (test) | ACC93.07 | 4 | |
| Indirect Prompt Injection Sanitization | SST2 | GCG Attack Success Rate0 | 2 | |
| Text Infilling | SST2 | Perplexity256.66 | 2 | |
| Image Classification | SST2 | Accuracy53.82 | 2 | |
| Faithfulness evaluation | SST2 (test) | AUC π-Soft-NS- | 0 |