| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Alpaca + XSTest (val) | wopt | AUROC0.988 | 42 | 1mo ago | |
| WILDGUARD (test) | WILDGUARD | F1 (Harmful)94 | 14 | 3mo ago | |
| Do-Not-Answer Portuguese (test) | gov.pt baseline | Accuracy100 | 9 | 3mo ago | |
| XSTEST-RESP (full) | GPT-4 | RR (F1)98.1 | 9 | 3mo ago | |
| XSTest Refusal OOD (test) | ACS Refusal Probe | MC Recall100 | 7 | 22d ago | |
| JailbreakBench Refusal OOD (test) | ACS Refusal Probe | MC Recall100 | 7 | 22d ago |