| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | WildJailbreak (held-out) | Attack Success Rate (ASR)0 | 50 | |
| Safety Alignment | WildJailbreak | Trainable parameters (M)15,768.31 | 44 | |
| Over-refusal | Wildjailbreak (Benign) | Wildjailbreak Benign Refusal Rate28.4 | 42 | |
| Safety | WildJailbreak | Harmful Response Ratio17.6 | 21 | |
| Safety Evaluation | WildJailbreak | Helpfulness Score (H)0.8592 | 20 | |
| Safety Moderation | WILDJAILBREAK (val) | ASR0.7 | 18 | |
| Jailbreak Detection | Wildjailbreak | F1 Score96 | 15 | |
| Safety Performance | WildJailbreak | Selective Refusal Score (Δs)90.1 | 11 |