| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HarmBench | STAR-1 | Harmful Response Ratio21.26 | 21 | 1mo ago | |
| HH Harmless | DLMA | Beaver-7B Cost Score3.25 | 10 | 1mo ago | |
| PKU-SafeRLHF | DLMA | Beaver-7B-Cost Score-1.11 | 10 | 1mo ago | |
| HarmMetric Eval | HarmClassifier | Score (Effectiveness)89.6 | 2 | 1mo ago |