| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HarmBench | ASR0 | 88 | 4d ago | ||
| WildJailbreak | Trainable parameters (M)15,768.31 | 44 | 4d ago | ||
| SORRY-Bench | LED-Merging | ASR10.22 | 40 | 4d ago | |
| Salad Bench | ShaPO-T | MD0.68 | 36 | 4d ago | |
| HH-RLHF | ShaPO-T | MD Rate1.09 | 36 | 4d ago | |
| Do-Not-Answer | ShaPO-R | MD0 | 36 | 4d ago | |
| PKU-SafeRLHF 30K (IID) | ShaPO-T | WR89.26 | 36 | 4d ago | |
| Harmful Dataset (test) | Harmful Score81 | 30 | 4d ago | ||
| HarmBench | SFT | MD Score95 | 18 | 4d ago | |
| Average (Do-Not-Answer, HarmBench, HH-RLHF, Salad Bench) | ShaPO-T | Aggregate Score0.59 | 18 | 4d ago | |
| PKU-SafeRLHF | PPO | Gold Reward3.92 | 14 | 4d ago | |
| XSTest | Yi-VL-6B | Compliance95.2 | 12 | 4d ago | |
| PKU-SafeRLHF in-distribution (test) | DPO | Accuracy (EN)99.44 | 10 | 4d ago | |
| Safety BeaverTails, HEx-PHI (test) | DASE | BeaverTails Score95.67 | 10 | 4d ago | |
| SafeRLHF | Chain-of-Thought | Win Rate83 | 8 | 4d ago | |
| Alpaca 7B (test) | UniARM | HV Score1.2916 | 5 | 4d ago | |
| WildGuardMix | Chain-of-Thought | Win Rate55 | 5 | 4d ago | |
| AdvBench | Chain-of-Thought | Wins99 | 5 | 4d ago | |
| Alpaca-65B Weak-to-strong Safety Alignment (test) | UniARM | HV Score131.81 | 3 | 4d ago | |
| PKU-SafeRLHF (test) | RM Safety Accuracy69.92 | 3 | 4d ago | ||
| StrongReject | - | - | 0 | 4d ago |