| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | StrongREJECT | Attack Success Rate89.1 | 138 | |
| Safety Evaluation | StrongReject | Attack Success Rate0.64 | 65 | |
| Jailbreak Defense | StrongReject | Attack Success Rate1.5 | 54 | |
| Red-teaming Safety Evaluation | StrongReject | ASR7 | 32 | |
| Jailbreak Robustness | StrongReject | Direct Attack Rate67 | 30 | |
| Multi-turn Jailbreaking | StrongReject (test) | ASR0.34 | 30 | |
| Safety and Helpfulness Evaluation | StrongREJECT | Harm Rate0.2 | 29 | |
| Jailbreaking | StrongReject (test) | ASR (GPT-4o)96 | 27 | |
| Adversarial Attack | StrongREJECT Original (test) | CHR46 | 27 | |
| Adversarial Attack | StrongREJECT Hijacked (test) | CHR0 | 27 | |
| Jailbreaking | StrongREJECT | ASR (Detoxify)0 | 20 | |
| Backdoor Attack Evaluation | StrongREJECT | ASR (w/ trigger)0.601 | 18 | |
| Safety Alignment | StrongReject | Safe@158 | 18 | |
| Safety Evaluation | StrongReject (SR) | Reasoning Harmful Ratio16.7 | 17 | |
| Jailbreak | StrongReject | ASR (GPT-4o)96.1 | 12 | |
| Jailbreak Robustness | StrongREJECT | Safe Response Rate96.33 | 12 | |
| Safety Evaluation | StrongREJECT T2I | ASR0.3 | 10 | |
| Jailbreak Evaluation | StrongReject | ASR-J95.5 | 9 | |
| Safety assessment | StrongReject | Personalization Bias (PB)0.179 | 9 | |
| Jailbreak Attack | StrongREJECT 2024 (Full) | AK100 | 9 | |
| Safety | SR (StrongReject) | Safety Rate99.7 | 8 | |
| Jailbreak Safety | StrongReject | Reasoning Safety63.2 | 6 | |
| Jailbreak Safety Evaluation | StrongREJECT (test) | Overall Score100 | 5 | |
| Jailbreak Resistance | StrongReject | Jailbreak Resistance: Illicit Crime99.7 | 4 | |
| Safety Evaluation | StrongReject (test) | StrongReject Score100 | 4 |