Toxicity

Benchmarks

Task Name	Dataset Name	SOTA Result
Toxicity Unlearning	Toxicity	S-unlearning Score0.45	16
Steering	Toxicity	Steering Success64	11
Case Deletion Diagnostics	Toxicity binary subsample (test)	AUC-DEL2.08	10
Adversarial Robustness	Toxicity Perturbation-based	Perplexity9.52	9
Text Classification	Toxicity Nooverlap BERT-small	AUC-DEL Plus0.003	7
Text Classification	Toxicity BERT-small targeted Kaggle 2018 (test)	AUC-DEL+0.016	7
Toxicity Classification	Toxicity	Original Accuracy90.4	6
Label aggregation assessment	Toxicity (test)	Test Accuracy79	4
Toxicity Reduction	Toxicity	Final Toxicity0.21	3

Showing 9 of 9 rows