| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| WildJailbreak | LLaDA-Instruct | Unsafe Rate0 | 144 | 22d ago | |
| JailbreakBench | LLaDA-Instruct | Harmbench ASR0 | 72 | 22d ago | |
| HarmBench | Dream-Instruct | HarmBench ASR0 | 72 | 22d ago | |
| AdvBench | LLaDA-Instruct | Harmbench ASR0 | 72 | 22d ago | |
| StrongREJECT | LLaDA-Instruct | Mean Harmful Score0 | 71 | 22d ago | |
| StrongReject | Direct Attack Rate67 | 30 | 14d ago | ||
| JBB-Behaviors (test) | MOSA | ASR0 | 24 | 3mo ago | |
| Jailbreak Attacks | Prefill Success Rate88.8 | 18 | 7d ago | ||
| HarmBench 1.0 (test) | Circuit-Breaker (CB) | GCG Attack Success Rate3.13 | 18 | 8d ago | |
| JB-R1 | Self-ReSET | Evaluation Score (avg@4)98.1 | 18 | 22d ago | |
| safe-unlearning | Self-ReSET | Avg Evaluation Score (k=4)98 | 18 | 22d ago | |
| WildTeaming WJ | Self-ReSET | Evaluation Score (avg@4)95.1 | 18 | 22d ago | |
| JBB-Behaviors | RA (ours) | ASR (PAIR, Guardrail Model)0.3 | 18 | 3mo ago | |
| Mousetrap | SAFEPATH-ZS | Harmfulness Rate0 | 17 | 27d ago | |
| AutoRAN | RealSafe-R1 | Harmfulness Rate0 | 17 | 27d ago | |
| Jailbreak Cipher, CodeChameleon (test) | DASE | Cipher Success Rate95.75 | 10 | 3mo ago | |
| StrongReject Static PAIR threshold 0.75 (test) | BCT | ASR0 | 9 | 12d ago | |
| AutoDAN Adv single-turn attack | ASR81 | 8 | 1d ago | ||
| AutoDAN Harm single-turn attack | Attack Success Rate (ASR)76.9 | 8 | 1d ago | ||
| StrongREJECT PAP | REFLECTOR | Goodness Score93.28 | 6 | 13d ago | |
| StrongREJECT PAIR | REFLECTOR | Goodness Score76.65 | 6 | 13d ago | |
| TAP | Goal | ASR4.33 | 4 | 1mo ago | |
| DAN Static | FedDetox | ASR74.8 | 3 | 1mo ago |