| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | AdvBench | AASR8,712 | 271 | |
| Adversarial Attack | AdvBench (test) | ASR100 | 145 | |
| Jailbreak Attack | AdvBench | ASR100 | 133 | |
| Jailbreaking | AdvBench | ASR100 | 132 | |
| Safety Evaluation | Advbench | Safety Score100 | 117 | |
| Jailbreak Defense | AdvBench | ASR (PAIR)0 | 115 | |
| Jailbreaking | AdvBench selected models | ASR@10100 | 90 | |
| Adversarial Attack Success Rate | AdvBench | ASR0 | 75 | |
| Jailbreak Attack | AdvBench (test) | ASR7,827 | 73 | |
| Unsafe Robustness | AdvBench | Unsafe Rate0 | 72 | |
| Jailbreak Robustness | AdvBench | Harmbench ASR0 | 72 | |
| Jailbreak Attack | Advbench-M | Attack Success Rate (ASR%)0 | 64 | |
| Jailbreak Attack | Advbench subset | ASR96 | 64 | |
| Jailbreak | AdvBench | Avg Queries2.1 | 63 | |
| Jailbreaking | AdvBench | BERT Score4.84 | 55 | |
| Safety Evaluation | AdvBench | Reasoning Harmfulness Rate0 | 50 | |
| Jailbreak Attack | AdvBench | Attack Success Rate (ASR)0 | 48 | |
| Jailbreak Attack | AdvBench 50 | ASR (KW)100 | 48 | |
| Safety Refusal | AdvBench | Refusal Rate99.42 | 46 | |
| Jailbreak Attack | AdvBench 150 Harmful Behaviors | ASR0 | 45 | |
| Harmful Request Defense | AdvBench | ASR0 | 44 | |
| Safety Evaluation | AdvBench Safety Evaluation | ASR (S1)1.35 | 42 | |
| Jailbreak Attack | AdvBench | ASR0 | 42 | |
| Jailbreak Attack | AdvBench gray-box setting | ASR100 | 42 | |
| Jailbreak Attack | AdvBench | Attack Success Rate (ASR)26.7 | 40 |