| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Toxicity Classification | CivilComments sensitive attribute: MUSLIM (test) | Balanced Accuracy59.9 | 57 | |
| Classification | CivilComments (test) | Worst-case Accuracy82.2 | 47 | |
| Robust Classification | CivilComments | Worst-Group Accuracy72.6 | 23 | |
| Toxicity detection | CivilComments-WILDS (test) | Average Accuracy92.7 | 19 | |
| Sentiment Classification | CivilComments HELM | Balanced Acc65.81 | 18 | |
| Text Classification | CivilComments-WILDS (test) | Accuracy92.34 | 13 | |
| Toxicity Classification | CivilComments (CC) (test) | Worst-Group Accuracy79.66 | 13 | |
| Toxicity Detection | CivilComments (test) | WGA71.6 | 9 | |
| Text Classification | CivilComments (val) | Accuracy69.1 | 6 | |
| Domain Generalization | CivilComments Wilds (test) | Average Accuracy92.2 | 6 | |
| Domain Generalization | CivilComments Wilds (val) | Average Accuracy92.3 | 6 | |
| Toxicity Classification | CivilComments | Average Accuracy92.6 | 3 |