| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | WildJailbreak | ASR0.101 | 53 | |
| Safety Evaluation | WildJailbreak (held-out) | Attack Success Rate (ASR)0 | 50 | |
| Over-refusal | Wildjailbreak (Benign) | Wildjailbreak Benign Refusal Rate1.43 | 49 | |
| Safety Alignment | WildJailbreak | Trainable parameters (M)15,768.31 | 44 | |
| Safety Alignment | WildJailbreak | Safe@177.2 | 24 | |
| Safety | WildJailbreak | Harmful Response Ratio17.6 | 21 | |
| Safety Moderation | WILDJAILBREAK (val) | ASR0.7 | 18 | |
| Jailbreak Safety | WildJailbreak | Reasoning Harmful Ratio17.3 | 17 | |
| Jailbreak Detection | Wildjailbreak | F1 Score96 | 15 | |
| Safety Performance | WildJailbreak | Selective Refusal Score (Δs)90.1 | 11 | |
| Refusal Evaluation | WildJailbreak Adversarial Harmful | Refusal Rate89.45 | 7 | |
| Jailbreak | WildJailbreak Forbidden Questions (Overall) | ASR92.1 | 2 |