| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Jailbreak Attack Resistance Evaluation | Jailbreak | MMLU62.3 | 40 | |
| Jailbreak Detection | Jailbreak data (70/30 stratified) | AUC100 | 32 | |
| Classification | JAILBREAK (test) | Accuracy98.8 | 32 | |
| Jailbreak | Jailbreak | MMLU65.5 | 20 | |
| Prompt Injection Defense | Jailbreak MI-FGSM | Attack Success Rate4 | 12 | |
| Prompt Injection Defense | Jailbreak APGD | ASR1 | 12 | |
| Jailbreak Evaluation | JailBreak R1 | Attack Success Rate (ASR)1.3 | 12 | |
| Safety Evaluation | JailBreak-R1 | LRM-JailBreak Score0 | 12 | |
| Steering | Jailbreak | Steering Success82.5 | 11 | |
| Jailbreak Robustness | Jailbreak Cipher, CodeChameleon (test) | Cipher Success Rate95.75 | 10 | |
| LLM Red-teaming | Jailbreak R1-defended Target Model | UA87.67 | 9 | |
| Adversarial Robustness | Jailbreak Structured-based | Perplexity2.62 | 9 | |
| Adversarial Robustness | Jailbreak Perturbation-based | Perplexity2.58 | 9 | |
| Jailbreaking | JailBreak | LG4 ASR1 | 8 | |
| OOD Detection | Jailbreak (test) | Length-Matched AUROC85.8 | 5 | |
| Jailbreak resistance evaluation | Jailbreak augmented examples | Not Unsafe Rate100 | 4 | |
| Calibration Analysis | Jailbreak | AUROC90 | 2 | |
| Jailbreak Detection | Jailbreak V28K | Accuracy99.98 | 2 |