| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Evaluation | SorryBench | Reasoning Success Rate (FFR)54.5 | 32 | |
| Jailbreak Attack | SorryBench | ASR (SorryBench)15.5 | 26 | |
| Safety Alignment Evaluation | SorryBench | Harmful Response Rate (%)4.2 | 18 | |
| Jailbreak Attack Defense | SorryBench | FFR (Reasoning)56.8 | 17 | |
| Jailbreaking | SorryBench | LG4 ASR43.6 | 8 | |
| Harmful score evaluation | Sorrybench | Harmful Score12.95 | 8 | |
| Safety Evaluation | SorryBench | ASR8.22 | 6 | |
| Safety Detection | SorryBench wrapped with HarmBench templates (held-out) | Detection Rate86.9 | 3 | |
| Safety Detection | SorryBench clean condition (held-out) | Detection Rate95 | 3 |