Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CivilComments

Benchmarks

Task NameDataset NameSOTA ResultTrend
Toxicity ClassificationCivilComments sensitive attribute: MUSLIM (test)
Balanced Accuracy59.9
57
ClassificationCivilComments (test)
Worst-case Accuracy82.2
47
Robust ClassificationCivilComments
Worst-Group Accuracy72.6
23
Text ClassificationCivilComments (16 groups)
Average Accuracy86.3
20
Toxicity detectionCivilComments-WILDS (test)
Average Accuracy92.7
19
Sentiment ClassificationCivilComments HELM
Balanced Acc65.81
18
Text ClassificationCivilComments
Worst-Group Accuracy81
17
Toxicity DetectionCivilComments (test)
WGA78.8
14
Text ClassificationCivilComments-WILDS (test)
Accuracy92.34
13
Toxicity ClassificationCivilComments (CC) (test)
Worst-Group Accuracy79.66
13
ClassificationCivilComments WILDS
Average Accuracy85.5
6
Text ClassificationCivilComments (val)
Accuracy69.1
6
Domain GeneralizationCivilComments Wilds (test)
Average Accuracy92.2
6
Domain GeneralizationCivilComments Wilds (val)
Average Accuracy92.3
6
Text ClassificationCivilComments controlled shortcut injection
Accuracy57.2
5
Toxicity ClassificationCivilComments
Average Accuracy92.6
3
Showing 16 of 16 rows