Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ToxiGen

Benchmarks

Task NameDataset NameSOTA ResultTrend
Toxicity DetectionToxiGen
Score84.23
95
Safety EvaluationToxiGen
Safety100
77
Hate speech classificationToxiGen (test)
AUC99
24
Toxicity GenerationToxiGen
ToxiGen Score1,633
24
Toxicity ClassificationToxigen
Accuracy60.41
22
HarmlessnessToxigen
Toxigen (%)100
17
DetoxificationToxiGen (test)
MTV97.4
16
Influence EstimationToxiGen (test)
Spearman Correlation0.44
14
Machine UnlearningToxiGen (test)
Accuracy ($D_f$)86.9
13
Machine UnlearningToxiGen (train)
Accuracy ($D_f$)85.06
13
Text ClassificationToxiGen (test)
Accuracy85
12
Bias DetectionToxigen (test)
Accuracy90.3
12
Safety EvaluationToxiGen Pretrained Evaluation
Toxicity Rate14.53
12
Toxicity DetectionTOXIGEN (val)
AUC96
8
Safety Over-triggeringToxiGen
Over-trigger Rate: Jewish0.02
7
Implicit Hate Speech DetectionToxigen
Macro-F193.41
5
Formal verification of safety classifiersToxigen
tau* Score0.9
3
Misuse DetectionToxiGen Homophobia (external)
TPR98
1
Misuse DetectionToxiGen Ethnoracial (external)
TPR91
1
Detoxification Dataset Quality EvaluationToxiGen 500 neutral-toxic pairs
Overall O.2.475
1
Showing 20 of 20 rows