| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | SafeBench | ASR0 | 128 | |
| Jailbreak Attack | SafeBench | HF1.7 | 54 | |
| Jailbreak Attack | SafeBench Tiny | ASR100 | 24 | |
| Jailbreak attack | Safebench (test) | IA ASR92 | 20 | |
| Jailbreak Attack | SafeBench | ADU Success Rate100 | 16 | |
| Safety Evaluation | SafeBench | IA Score100 | 15 | |
| Scene Criticality Generation | SafeBench | Collision Rate (CN)22 | 6 | |
| Multimodal Jailbreaking | SafeBench FigStep (ID) | ASR92.3 | 6 | |
| Multimodal Jailbreaking | SafeBench QR (ID) | ASR0 | 6 | |
| Multimodal Jailbreaking | SafeBench Mirror (ID) | ASR100 | 6 | |
| Safety evaluation of autonomous driving | SafeBench critical scenarios | Collision Rate (SO)2.1 | 5 | |
| Multimodal Safety Evaluation | SafeBench | FS ASR3.26 | 4 | |
| Jailbreaking | SafeBench evaluated on OpenAI-o1 | FS34.8 | 1 |