| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack | StrongREJECT | Attack Success Rate99.4 | 262 | |
| Safety Evaluation | StrongReject | Attack Success Rate0 | 77 | |
| Jailbreak Robustness | StrongREJECT | Mean Harmful Score0 | 71 | |
| Safety Alignment Breaking Prevention | StrongREJECT | Harmful Score (%)0 | 60 | |
| Jailbreak Defense | StrongReject | Attack Success Rate1.5 | 54 | |
| Red-teaming Safety Evaluation | StrongReject | ASR4 | 53 | |
| Jailbreak Robustness | StrongReject | Direct Attack Rate67 | 30 | |
| Multi-turn Jailbreaking | StrongReject (test) | ASR0.34 | 30 | |
| Safety and Helpfulness Evaluation | StrongREJECT | Harm Rate0.2 | 29 | |
| Jailbreaking | StrongReject (test) | ASR (GPT-4o)96 | 27 | |
| Adversarial Attack | StrongREJECT Original (test) | CHR46 | 27 | |
| Adversarial Attack | StrongREJECT Hijacked (test) | CHR0 | 27 | |
| Safety Evaluation | StrongReject | H Score62 | 22 | |
| Safety Evaluation | StrongReject | Safety Score97 | 21 | |
| Backdoor detection | StrongREJECT prompts with triggers | TPR100 | 20 | |
| Jailbreaking | StrongREJECT | ASR (Detoxify)0 | 20 | |
| Harmful Content Safety | StrongReject (SR) | Evaluation Score (avg@4)100 | 18 | |
| Backdoor Attack Evaluation | StrongREJECT | ASR (w/ trigger)0.601 | 18 | |
| Safety Alignment | StrongReject | Safe@158 | 18 | |
| Safety Evaluation | StrongReject (SR) | Reasoning Harmful Ratio16.7 | 17 | |
| Safety Moderation | StrongReject | F1 Score100 | 15 | |
| Safety Alignment Evaluation | StrongReject SR-PAPL | Safety Rate100 | 14 | |
| Safety Alignment Evaluation | StrongReject SR-PAPA | Safety Rate100 | 14 | |
| Safety Alignment Evaluation | StrongReject SR-PAP_M | Safety Rate100 | 14 | |
| Safety Alignment Evaluation | StrongReject SR-Pair | Safety Rate98.72 | 14 |