| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OR-Bench | SFT (30/70) | Toxic Refusal Rate99.6 | 40 | 3d ago | |
| Past Tense | Qwen2.5-7B-Instruct | ASR51 | 40 | 3d ago | |
| XSTest Seemingly Toxic Subsets | DCR | XS98 | 15 | 1mo ago | |
| Harmful prompt suite (test) | Refusal Rate95.5 | 15 | 1mo ago | ||
| CCP Sensitive | Qwen3-Next-80B-A3B-Thinking | Reject Rate92.35 | 13 | 1mo ago | |
| Do-Not-Answer | Low-Rank Combination | Refusal Rate95.21 | 7 | 1mo ago | |
| HarmfulQA | Categorical Steering | Refusal Rate85.31 | 7 | 1mo ago | |
| XSTest Unsafe | Categorical Steering | Refusal Rate99 | 7 | 1mo ago | |
| OR-Bench Toxic | Categorical Steering | Refusal Rate94.66 | 7 | 1mo ago | |
| WildJailbreak Adversarial Harmful | Low-Rank Combination | Refusal Rate89.45 | 7 | 1mo ago | |
| WildGuard Harmful | Low-Rank Combination | Refusal Rate84.35 | 7 | 1mo ago | |
| CoCoNot Orig | Categorical Steering | Refusal Rate96.1 | 7 | 1mo ago | |
| HarmBench | LLaMA-3.1 8B | Baseline Performance71 | 1 | 1mo ago | |
| XSTest | LLaMA-3.1 8B | Refusal Rate (Before)61.27 | 1 | 1mo ago |