| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| JBB-Behaviors Scenario J3 | EvoJail | Hypervolume0.707 | 21 | 26d ago | |
| JBB-Behaviors Scenario J2 | EvoJail | Hypervolume0.691 | 21 | 26d ago | |
| JBB-Behaviors Scenario J1 | EvoJail | Hypervolume59.1 | 21 | 26d ago | |
| GPTFuzzer Scenario G3 | EvoJail | Hypervolume0.696 | 21 | 26d ago | |
| GPTFuzzer Scenario G2 | EvoJail | Hypervolume77 | 21 | 26d ago | |
| GPTFuzzer Scenario G1 | EvoJail | Hypervolume0.708 | 21 | 26d ago | |
| 100-query jailbreak set | Jailbreak Success Rate46.4 | 8 | 1mo ago | ||
| AdvBench GPT-4 Series | AJF | ASR98.9 | 5 | 1mo ago | |
| AdvBench Llama2-13b | GPTFuzz Top-5 | ASR95.4 | 5 | 1mo ago | |
| AdvBench Llama2-7b | GPTFuzz Top-5 | Attack Success Rate (ASR)97.3 | 5 | 1mo ago |