Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Toxicity

Benchmarks

Task NameDataset NameSOTA ResultTrend
Toxicity UnlearningToxicity
S-unlearning Score0.45
16
SteeringToxicity
Steering Success64
11
Case Deletion DiagnosticsToxicity binary subsample (test)
AUC-DEL2.08
10
Adversarial RobustnessToxicity Perturbation-based
Perplexity9.52
9
Text ClassificationToxicity Nooverlap BERT-small
AUC-DEL Plus0.003
7
Text ClassificationToxicity BERT-small targeted Kaggle 2018 (test)
AUC-DEL+0.016
7
Toxicity ClassificationToxicity
Original Accuracy90.4
6
Label aggregation assessmentToxicity (test)
Test Accuracy79
4
Toxicity ReductionToxicity
Final Toxicity0.21
3
Showing 9 of 9 rows