| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | SafeBench | ASR0 | 112 | |
| Jailbreak Attack | SafeBench Tiny | ASR100 | 24 | |
| Jailbreak attack | Safebench (test) | IA ASR92 | 20 | |
| Jailbreak Attack | SafeBench | ADU Success Rate100 | 16 | |
| Multimodal Safety Evaluation | SafeBench | FS ASR3.26 | 4 | |
| Jailbreaking | SafeBench evaluated on OpenAI-o1 | FS34.8 | 1 |