| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Defense | ActorAttack | Attack Success Rate (ASR)0 | 34 | |
| Safety Evaluation | ActorAttack | ASR3.5 | 8 | |
| Jailbreak Attack | ActorAttack (test) | ASR54 | 4 | |
| Adversarial Robustness | ActorAttack (out-of-domain) | ASR0.435 | 4 | |
| Unsafe-input detection | ActorAttack (600) | Recall87.83 | 2 |