| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | JailbreakBench | ASR100 | 242 | |
| Jailbreak Attack | JailbreakBench | ASR@100 | 132 | |
| Jailbreak Robustness | JailbreakBench | Harmbench ASR0 | 72 | |
| Unsafe Robustness | JailbreakBench | Unsafe Rate0 | 72 | |
| Jailbreak Attack | JailbreakBench (JBB) | ASR0 | 62 | |
| Safety Evaluation | JailbreakBench (JBB) (test) | ASR (Llama-Guard-3-8B)1.12 | 56 | |
| Jailbreaking | JailbreakBench | Attack Success Rate (ASR)2 | 53 | |
| Jailbreak Attack | JailbreakBench (JBB) (test) | Attack Success Rate (ASR)98 | 42 | |
| Jailbreak Attack | JailbreakBench | Attack Success Rate (ASR)96 | 40 | |
| Jailbreak Attack | JailbreakBench | ASR197 | 39 | |
| Jailbreak | JailbreakBench (original split) | ASR@195.15 | 33 | |
| Jailbreak Defense | JailbreakBench | ASR (GCG)0 | 30 | |
| Jailbreak Attack | JailbreakBench | ASR91 | 27 | |
| Thinking Collapse Analysis | JailbreakBench (JBB) | Thinking Collapse Rate0 | 25 | |
| Jailbreaking | JailbreakBench | ASR (Detoxify)0 | 20 | |
| Jailbreak Defense | JailbreakBench | Rate of Response Safety70 | 20 | |
| Adversarial and Jailbreaking Attack Detection | JailbreakBench | AUROC0.8622 | 20 | |
| Jailbreak Attack | JailbreakBench | Llama2 7B Attack Success Rate77 | 18 | |
| Red-teaming Attack Success Rate | JailbreakBench (test) | ASR (Vicuna)82 | 18 | |
| Jailbreak Safety | JailbreakBench | Reasoning Harmful Ratio0.3 | 17 | |
| Safety Evaluation | JailbreakBench | Harmful Rate0 | 16 | |
| Jailbreak Attack | JailbreakBench PAIR | Attack Success Rate (ASR)82 | 15 | |
| Jailbreak Attack | JailBreakBench | JSR0.9 | 14 | |
| Jailbreak Attack | JailbreakBench | ASR81 | 12 | |
| Adversarial Attack | JailBreakBench non-trivial subset held-out prompts (Llama-2-7B: 280) | ASR26 | 12 |