Share your thoughts, 1 month free Claude Pro on usSee more

RealToxicityPrompts

Benchmarks

Task Name	Dataset Name	SOTA Result
Profanity suppression	RealToxicityPrompts	Relative Throughput103.6	126
Language model detoxification	RealToxicityPrompts (test)	Distinct-191.3	54
Toxicity Mitigation	RealToxicityPrompts challenging	Avg Toxicity (Max)6.2	46
Toxicity Evaluation	RealToxicityPrompts	Toxicity Score0	33
Detoxification	RealToxicityPrompts challenging	Max Toxicity0.062	32
Toxicity Mitigation	REALTOXICITYPROMPTS	Toxicity21.24	24
Detoxification	RealToxicityPrompts	Avg Max Toxicity0.27	22
Toxicity Mitigation	RealToxicityPrompts 1k samples	CLS Toxicity0.51	20
Spoofing attack traceability	RealToxicityPrompts (test)	AUC90.11	20
Toxicity evaluation	RealToxicityPrompts 1K non-toxic prompts, 1K toxic prompts	Count of Non-Toxic Samples5	14
Toxicity Mitigation	RealToxicityPrompts (test)	Full Toxicity10.1	14
Multi-modal Toxicity Attack	RealToxicityPrompts (RTP) (test)	Overall Score31.36	12
Toxicity Mitigation	RealToxicityPrompts (RTP)	CLS Tox Rate0.53	12
Toxicity Generation	RealToxicityPrompts (test)	Perspective API Score9.2	12
Toxicity Analysis	RealToxicityPrompts Nontoxic	Exp. Max. Toxicity0.22	10
Controlled Text Generation	RealToxicityPrompts 10K nontoxic prompts	Avg Max Toxicity30.2	9
Non-toxic generation	RealToxicityPrompts	Avg. Max Toxicity0.115	8
Toxic Text Generation	RealToxicityPrompts malicious	Attack Success Rate (ASR)14.8	8
Toxicity Auditing	RealToxicityPrompts (hold-out)	Detoxify Identity Attack Score4.97	7
Multimodal Safety Auditing	RealToxicityPrompts primary evaluation	Detoxify Identity Attack Score3.05	7
Toxic Language Suppression	RealToxicityPrompts 10K nontoxic prompts GPT2-large generation (test)	Max Toxicity0.172	7
Bias Scoring Correlation	RealToxicityPrompts sampled completions	Spearman Correlation Coefficient0.4748	6
Safety-shift experiment	RealToxicityPrompts	Coverage90	5
Toxicity generation robustness under visual adversarial attack	RealToxicityPrompts	Identity Count23	5
Toxicity Evaluation	RealToxicityPrompts RTP-N (Nontoxic)	Toxic Fraction0.2	5

Showing 25 of 42 rows