| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | AdvBench | AASR8,712 | 263 | |
| Safety Evaluation | Advbench | Safety Score100 | 117 | |
| Jailbreaking | AdvBench | ASR100 | 114 | |
| Adversarial Attack Success Rate | AdvBench | ASR0 | 75 | |
| Jailbreak Attack | AdvBench (test) | ASR7,827 | 73 | |
| Jailbreak Attack | Advbench-M | Attack Success Rate (ASR%)0 | 64 | |
| Jailbreak | AdvBench | Avg Queries2.1 | 63 | |
| Jailbreaking | AdvBench | BERT Score4.84 | 55 | |
| Jailbreak Defense | AdvBench | ASR (Overall)0 | 49 | |
| Jailbreak Attack | AdvBench | Attack Success Rate (ASR)0 | 48 | |
| Jailbreak Attack | AdvBench 50 | ASR (KW)100 | 48 | |
| Safety Refusal | AdvBench | Refusal Rate99.42 | 46 | |
| Jailbreak Attack | AdvBench 150 Harmful Behaviors | ASR0 | 45 | |
| Harmful Request Defense | AdvBench | ASR0 | 44 | |
| Jailbreak Attack | AdvBench | Attack Success Rate (ASR)26.7 | 40 | |
| Jailbreak Defense | AdvBench PAD | ASR12.12 | 40 | |
| Jailbreak Attack | AdvBench-50 + Malicious Instruct | ASR100 | 40 | |
| Jailbreaking | AdvBench Sub | BERT Score4.73 | 40 | |
| Transferable Adversarial Attack | AdvBench LLM Classifier (test) | TASR@19,260 | 39 | |
| Jailbreak Attack | AdvBench GPT-3.5-turbo 1.0 (test) | Attack Success Rate97.12 | 38 | |
| Jailbreak Attack | AdvBench | Avg.Q93.48 | 36 | |
| Jailbreak Defense | AdvBench PAIR attack | DSR98 | 35 | |
| Jailbreaking | AdvBench (test) | Average ASR99.04 | 33 | |
| Safety evaluation | AdvBench 50 examples | Safe Response Rate100 | 32 | |
| Visual Jailbreaking Attack | AdvBench | ASR43.84 | 32 |