| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak | JBB | Jailbreak Rate0 | 70 | |
| Safety Performance | JBB | Refusal Score (CR)53 | 35 | |
| Safety Evaluation | JBB | Safe@159.67 | 18 | |
| Jailbreak Attack | JBB-Behaviors | S2C (Ours)100 | 16 | |
| Jailbreak Attack | JBB Qwen3-4B | Loss0.149 | 13 | |
| Jailbreak Attack | JBB | Llama2-7B ASR91.7 | 12 | |
| Jailbreak Attack Evaluation | JBB sampled harmful behaviors | PAIR Success Rate100 | 12 | |
| Jailbreak Attack | JBB Gemma3-4B | Loss0.348 | 8 | |
| Jailbreak Attack | JBB Llama2-7B | Loss0.106 | 8 | |
| Jailbreak Attack | JBB Llama3.1-8B | Loss0.466 | 7 | |
| Jailbreak Attack | JBB | ASR (Paper)97 | 7 | |
| Safety Evaluation | JBB | Score93 | 6 | |
| Multi-label Safety Categorization | jbb behaviors category | Macro Accuracy59.37 | 4 | |
| Multi-label Safety Categorization | jbb behaviors behavior | Macro Accuracy72.17 | 4 | |
| Robustness to Jailbreak Attacks | JBB Paraphrased | Harmful Reasoning Ratio16.1 | 3 | |
| Jailbreak Attack Evaluation | JBB | Attack Success Count1 | 2 | |
| Transfer Jailbreak Attack | JBB Sonnet-4 | ASR21.5 | 2 | |
| Transfer Jailbreak Attack | JBB GPT-5-mini | Attack Success Rate (ASR)60 | 2 | |
| Transfer Jailbreak Attack | JBB Target: Gemini-3-flash | ASR5.6 | 2 |