Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CivilComments

Benchmarks

Task NameDataset NameSOTA ResultTrend
Toxicity ClassificationCivilComments sensitive attribute: MUSLIM (test)
Balanced Accuracy59.9
57
ClassificationCivilComments (test)
Average Accuracy92.2
51
Robust ClassificationCivilComments
Worst-Group Accuracy72.6
23
Toxicity DetectionCivilComments BERT (test)
Oracle ECE0.57
20
Text ClassificationCivilComments (16 groups)
Average Accuracy86.3
20
Toxicity detectionCivilComments-WILDS (test)
Average Accuracy92.7
19
Sentiment ClassificationCivilComments HELM
Balanced Acc65.81
18
Text ClassificationCivilComments
Worst-Group Accuracy81
17
Toxicity DetectionCivilComments (test)
WGA78.8
14
Text ClassificationCivilComments-WILDS (test)
Accuracy92.34
13
Toxicity ClassificationCivilComments (CC) (test)
Worst-Group Accuracy79.66
13
Toxicity ClassificationCivilComments WILDS
Worst-Group Accuracy75.3
11
ClassificationCivilComments WILDS
Average Accuracy85.5
6
Text ClassificationCivilComments (val)
Accuracy69.1
6
Domain GeneralizationCivilComments Wilds (test)
Average Accuracy92.2
6
Domain GeneralizationCivilComments Wilds (val)
Average Accuracy92.3
6
CalibrationCivilComments BERT (test)
ECE (Oracle Estimate)1.08
5
Text ClassificationCivilComments controlled shortcut injection
Accuracy57.2
5
text classificationCivilComments
ECE0.092
4
Toxicity ClassificationCivilComments
Average Accuracy92.6
3
Accountability AttributionCivilComments (test)
Stage 1 Score85.76
2
Showing 21 of 21 rows