| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | AdvBench | AASR8,712 | 247 | |
| Safety Evaluation | Advbench | Safety Score100 | 117 | |
| Adversarial Attack Success Rate | AdvBench | ASR0 | 75 | |
| Jailbreak | AdvBench | Avg Queries2.1 | 63 | |
| Jailbreak Defense | AdvBench | ASR (Overall)0 | 49 | |
| Jailbreak Attack | AdvBench 50 | ASR (KW)100 | 48 | |
| Harmful Request Defense | AdvBench | ASR0 | 44 | |
| Jailbreaking | AdvBench | ASR99.2 | 44 | |
| Transferable Adversarial Attack | AdvBench LLM Classifier (test) | TASR@19,260 | 39 | |
| Jailbreak Defense | AdvBench PAIR attack | DSR98 | 35 | |
| Safety evaluation | AdvBench 50 examples | Safe Response Rate100 | 32 | |
| Jailbreak Attack | AdvBench AdvSub | QSR100 | 30 | |
| Jailbreak | AdvBench Ensemble configuration GPT-4o | Attack Success Rate (ASR)0 | 25 | |
| Jailbreak Attack | AdvBench GPT-3.5-turbo 1.0 (test) | Attack Success Rate97.12 | 22 | |
| Jailbreak Attack | AdvBench (test) | ASR (HILL)98 | 22 | |
| Jailbreak Attack | Advbench Vicuna-33B Guard 100 prompts Original | ASR0 | 21 | |
| Jailbreak Attack | Advbench Llama2-70B Guard 100 prompts Original | ASR0 | 21 | |
| Adversarial and Jailbreaking Attack Detection | AdvBench | AUROC0.9675 | 20 | |
| Adversarial Attack | AdvBench (query-specific) | MR37 | 20 | |
| Priming Attack Robustness | AdvBench No Attack (test) | ASR (GPT-4o)0 | 18 | |
| Jailbreak Robustness | AdvBench | PAIR ASR (GPT-4o)4 | 18 | |
| Jailbreak Attack Success Rate | AdvBench-x | ASR (English)7.94 | 18 | |
| Jailbreak Attack | AdvBench Llama3.1-405B 1.0 (test) | ASR0 | 17 | |
| Jailbreak Attack | AdvBench GPT-4o-mini 1.0 (test) | ASR81.35 | 17 | |
| Jailbreak Attack | AdvBench GPT-4o 1.0 (test) | ASR92.67 | 17 |