| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| DirectHarm | Harmfulness Score5 | 56 | 1mo ago | ||
| DirectHarm (test) | Harmfulness Score (Llama-Guard-3B)5 | 56 | 1mo ago | ||
| Harmfulness Evaluation Suite JBB, SR, WJ, GCG, JBC, PAIR (test) | No train | JBB91 | 36 | 1mo ago | |
| Mousetrap | Harmfulness Score3.78 | 22 | 27d ago | ||
| AutoRAN | SAFEPATH-FT | Harmfulness Score1.32 | 22 | 27d ago | |
| PAIR | SAFEPATH-FT | Harmfulness Score1.08 | 22 | 13d ago | |
| GCG | SAFEPATH-FT | Harmfulness Score1.16 | 22 | 27d ago | |
| AdvBench | SAFEPATH-FT | Harmfulness Score1.06 | 22 | 27d ago | |
| HarmBench | STAR-1 | Harmful Response Ratio21.26 | 21 | 3mo ago | |
| AdvBench-audio Harmful | MDSteer-h2s | ASR Score26.35 | 15 | 26d ago | |
| SORRY-Bench audio | MDSteer-c2r | ASR Accuracy78.41 | 15 | 26d ago | |
| Figstep-audio Harmful | MDSteer-c2r | ASR90.8 | 15 | 26d ago | |
| AJailBench Harmful | MDSteer-c2r | ASR49 | 12 | 26d ago | |
| HH Harmless | DLMA | Beaver-7B Cost Score3.25 | 10 | 3mo ago | |
| PKU-SafeRLHF | DLMA | Beaver-7B-Cost Score-1.11 | 10 | 3mo ago | |
| HarmfulQ | NLCf/800 step | Harmfulness Rate0 | 6 | 1d ago | |
| I-Controversial | Mistral MoE-XL | Rate0 | 6 | 1d ago | |
| I-CoNa | NLCf/800 step | Harmfulness Rate0 | 6 | 1d ago | |
| I-Malicious | NLCf/800 step | Harmful Rate0 | 6 | 1d ago | |
| HarmMetric Eval | HarmClassifier | Score (Effectiveness)89.6 | 2 | 2mo ago |