| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | HarmBench | Attack Success Rate (ASR)100 | 557 | |
| Red-Teaming | HarmBench | ASR96.3 | 244 | |
| Jailbreak Attack | HarmBench (test) | ASRHB99.73 | 212 | |
| Safety Evaluation | HarmBench | ASR0 | 148 | |
| Safety Evaluation | Harmbench | Harmbench Score0.06 | 127 | |
| Response Harmfulness Detection | HarmBench | F1 Score98.94 | 100 | |
| Jailbreak Defense | HarmBench | PAIR ASR0 | 91 | |
| Safety Alignment | HarmBench | ASR0 | 88 | |
| Unsafe Robustness | HarmBench | Unsafe Rate1 | 72 | |
| Jailbreak Robustness | HarmBench | HarmBench ASR0 | 72 | |
| Jailbreaking | HarmBench | Attack Success Rate (ASR)82.3 | 68 | |
| Multimodal Jailbreak Attack | HarmBench | ASR0 | 62 | |
| Safety Alignment Breaking Prevention | HarmBench | Harmful Score (%)0 | 60 | |
| Jailbreak Attack Success Rate | HarmBench | Attack Success Rate (Generated)96 | 52 | |
| Harmful Prompt Refusal | HarmBench | ASR0 | 52 | |
| Jailbreaking | HARMBENCH 159 standard behaviors (test) | ASR0 | 51 | |
| Jailbreak | HarmBench | Toxicity Score1.01 | 50 | |
| Jailbreak | HarmBench Standard Behaviours (200 examples) | ASR0 | 48 | |
| Jailbreak Attack | HarmBench-191 (dev) | Attack Success Rate (ASR)97.4 | 42 | |
| Refusal Ablation and Jailbreak Attack Success | HARMBENCH | Attack Success Rate (ASR)96.27 | 40 | |
| Controllability | HarmBench | HarmBench Score87.5 | 40 | |
| Safety Evaluation | Harmbench | ASR3.5 | 39 | |
| Transferable Adversarial Attack | HarmBench Classifier (test) | TASR@188.6 | 37 | |
| Jailbreak | HarmBench (HB) (standard split) | ASR@190.57 | 33 | |
| Safety Evaluation | HarmBench | MD95 | 32 |