| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | SafeBench | ASR0 | 245 | |
| Jailbreak Attack | SafeBench | HF1.7 | 54 | |
| Jailbreak Attack Evaluation | SafeBench 100 sampled harmful queries | ASR97 | 48 | |
| Jailbreak Attack | SafeBench Tiny | ASR100 | 24 | |
| Jailbreak attack | Safebench (test) | IA ASR92 | 20 | |
| Safety Evaluation | SafeBench | Overall Safety Score99 | 19 | |
| Critical Scenario Generation | SafeBench Scenario 6 (Unprotected Left-turn) | Collision Rate (CR)0 | 16 | |
| Jailbreak Attack | SafeBench | ADU Success Rate100 | 16 | |
| Safety-critical scenario generation | SafeBench | Collision Rate: Straight Obstacle30 | 8 | |
| Scene Criticality Generation | SafeBench | Collision Rate (CN)22 | 6 | |
| Multimodal Jailbreaking | SafeBench FigStep (ID) | ASR92.3 | 6 | |
| Multimodal Jailbreaking | SafeBench QR (ID) | ASR0 | 6 | |
| Multimodal Jailbreaking | SafeBench Mirror (ID) | ASR100 | 6 | |
| Safety evaluation of autonomous driving | SafeBench critical scenarios | Collision Rate (SO)2.1 | 5 | |
| Multimodal Safety Evaluation | SafeBench | FS ASR3.26 | 4 | |
| Autonomous Driving Ego Robustness Evaluation | SafeBench held-out (test) | CR (Straight Obstacle)5 | 3 | |
| Jailbreaking | SafeBench evaluated on OpenAI-o1 | FS34.8 | 1 |