| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Profanity suppression | RealToxicityPrompts | Relative Throughput103.6 | 126 | |
| Language model detoxification | RealToxicityPrompts (test) | Distinct-191.3 | 54 | |
| Toxicity Mitigation | RealToxicityPrompts challenging | Avg Toxicity (Max)6.2 | 46 | |
| Detoxification | RealToxicityPrompts challenging | Max Toxicity0.062 | 32 | |
| Toxicity Evaluation | RealToxicityPrompts | Toxicity Score0 | 29 | |
| Toxicity Mitigation | REALTOXICITYPROMPTS | Toxicity21.24 | 24 | |
| Detoxification | RealToxicityPrompts | Avg Max Toxicity0.27 | 22 | |
| Toxicity Mitigation | RealToxicityPrompts 1k samples | CLS Toxicity0.51 | 20 | |
| Spoofing attack traceability | RealToxicityPrompts (test) | AUC90.11 | 20 | |
| Toxicity evaluation | RealToxicityPrompts 1K non-toxic prompts, 1K toxic prompts | Count of Non-Toxic Samples5 | 14 | |
| Toxicity Mitigation | RealToxicityPrompts (test) | Full Toxicity10.1 | 14 | |
| Multi-modal Toxicity Attack | RealToxicityPrompts (RTP) (test) | Overall Score31.36 | 12 | |
| Toxicity Mitigation | RealToxicityPrompts (RTP) | CLS Tox Rate0.53 | 12 | |
| Toxicity Generation | RealToxicityPrompts (test) | Perspective API Score9.2 | 12 | |
| Toxicity Analysis | RealToxicityPrompts Nontoxic | Exp. Max. Toxicity0.22 | 10 | |
| Controlled Text Generation | RealToxicityPrompts 10K nontoxic prompts | Avg Max Toxicity30.2 | 9 | |
| Non-toxic generation | RealToxicityPrompts | Avg. Max Toxicity0.115 | 8 | |
| Toxic Text Generation | RealToxicityPrompts malicious | Attack Success Rate (ASR)14.8 | 8 | |
| Toxicity Auditing | RealToxicityPrompts (hold-out) | Detoxify Identity Attack Score4.97 | 7 | |
| Multimodal Safety Auditing | RealToxicityPrompts primary evaluation | Detoxify Identity Attack Score3.05 | 7 | |
| Toxic Language Suppression | RealToxicityPrompts 10K nontoxic prompts GPT2-large generation (test) | Max Toxicity0.172 | 7 | |
| Toxicity Evaluation | RealToxicityPrompts RTP-N (Nontoxic) | Toxic Fraction0.2 | 5 | |
| Toxicity Evaluation | RealToxicityPrompts RTP-C | Toxic Fraction18.1 | 5 | |
| Counterfactual Fairness | RealToxicityPrompts RTP-N | Sentiment Parity0.006 | 5 | |
| Counterfactual Fairness | RealToxicityPrompts RTP-C | Sentiment Parity0.2 | 5 |