| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Beavertails | Violation Rate1.2 | 32 | 4d ago | ||
| SafeChain | Safety | SafeChain Score4.94 | 27 | 2d ago | |
| Safety-Tuned | FuseLLM | Safety-Tuned Score3.06 | 27 | 2d ago | |
| Safety Evaluation Suite (Salad-Bench, WildJailbreak, JailbreakBench, WildChat, WildGuard) | Safety Rate (S.R.)100 | 24 | 4d ago | ||
| T3 | T3 Score85.1 | 21 | 4d ago | ||
| WildJailbreak | STAR-1 | Harmful Response Ratio17.6 | 21 | 4d ago | |
| HarmBench, SafeChain Safety-Tuned | LED | Safety-Tuned Score2.41 | 18 | 2d ago | |
| 8 jailbreak attacks (Aggregated) | REPBEND | Average ASR3.13 | 15 | 4d ago | |
| AIR-Bench | REINFORCE++ (Ours) | Average Score0.66 | 12 | 4d ago | |
| AttaQ | REINFORCE++ (Ours) | Average Score0.81 | 12 | 4d ago | |
| Safety Evaluation Suite | OLMo 2 7B Inst | Score0.911 | 9 | 4d ago | |
| SR (StrongReject) | Safety Rate99.7 | 8 | 4d ago | ||
| Cultural Kaleidoscope | DuoGuard | F1 Score76.6 | 7 | 4d ago | |
| IndicSafe En | PG-Qwen | F1 Score91.39 | 7 | 4d ago | |
| KGC-SAFETY in-house | K-EXAONE | Safety Score88.4 | 4 | 4d ago | |
| WildChat | DPO_Whisperer | Safe Response Rate94.22 | 2 | 4d ago |