| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Detoxification dataset | Palette w/ LaPA² | Toxicity Score19.07 | 42 | 15d ago | |
| AttaQ benchmark | RAD | Avg Toxicity (Max)0.045 | 32 | 3mo ago | |
| RealToxicityPrompts challenging | RAD | Max Toxicity0.062 | 32 | 3mo ago | |
| Jigsaw (test) | Prefix | Perplexity (PPL)20.8 | 29 | 3mo ago | |
| BOLD | RAD | Toxicity (Max)1.9 | 28 | 3mo ago | |
| RealToxicityPrompts | SGEAT + DEXPERTS | Avg Max Toxicity0.27 | 22 | 3mo ago | |
| SafeEdit | STA | Detoxification Performance95.78 | 18 | 3mo ago | |
| ToxiGen (test) | DetoxLLM | MTV97.4 | 16 | 3mo ago | |
| SafeNLP (test) | Llama-2-7b | Similarity84 | 13 | 3mo ago | |
| AdvBench harmful behavior set | PaCE | Safety Score99.17 | 10 | 3mo ago | |
| OOD | TP Score54 | 6 | 1mo ago | ||
| ID | TP Score55 | 6 | 1mo ago | ||
| REALTOXICITYPROMPTS (test) | FINE-GRAINED RLHF | Toxicity Score (Avg)0.081 | 5 | 3mo ago | |
| Detoxification | DuNST | Fluency3.58 | 3 | 3mo ago |