| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AttaQ benchmark | RAD | Avg Toxicity (Max)0.045 | 32 | 4d ago | |
| RealToxicityPrompts challenging | RAD | Max Toxicity0.062 | 32 | 4d ago | |
| Jigsaw (test) | Prefix | Perplexity (PPL)20.8 | 29 | 4d ago | |
| BOLD | RAD | Toxicity (Max)1.9 | 28 | 3d ago | |
| RealToxicityPrompts | SGEAT + DEXPERTS | Avg Max Toxicity0.27 | 22 | 4d ago | |
| SafeEdit | STA | Detoxification Performance95.78 | 18 | 4d ago | |
| ToxiGen (test) | DetoxLLM | MTV97.4 | 16 | 4d ago | |
| SafeNLP (test) | Llama-2-7b | Similarity84 | 13 | 3d ago | |
| AdvBench harmful behavior set | PaCE | Safety Score99.17 | 10 | 3d ago | |
| REALTOXICITYPROMPTS (test) | FINE-GRAINED RLHF | Toxicity Score (Avg)0.081 | 5 | 3d ago | |
| Detoxification | DuNST | Fluency3.58 | 3 | 4d ago |