| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Religious Discrimination principle v1 (test) | QCI | Mean Best Category Score5.32 | 12 | 4d ago | |
| Illegal Activity principle v1 (test) | Mean Score (Best Category)-2.73 | 12 | 4d ago | ||
| AI Supremacy principle v1 (test) | CRL | Mean Best Category Score11.7 | 12 | 4d ago | |
| DailyDialog against DialoGPT-large | BRT (e+r) | RSR40 | 8 | 4d ago | |
| DailyDialog against BB-3B | BRT (e+r) | RSR40.2 | 8 | 4d ago | |
| ConvAI2 (filtered hard positive) | BRT (e+r) | RSR2,120 | 7 | 4d ago | |
| Bloom ZS (filtered hard positive) | BRT (e+r) | RSR15.6 | 7 | 4d ago | |
| BAD Against Friend Chat (test) | BRT (e) | RSR64.2 | 7 | 4d ago | |
| BAD Against Marv (test) | BRT (s+r) | RSR88.1 | 7 | 4d ago | |
| Korean red teaming dataset (test) | Exaone-3.5-2.4B-inst | Attack Success Rate0.5797 | 5 | 4d ago | |
| HarmBench Claude-Sonnet-3.5 (held-out test) | AGENTICRED | ASR60 | 5 | 4d ago | |
| HarmBench Llama-3-8B (test) | AGENTICRED | ASR0.98 | 5 | 4d ago | |
| HarmBench Llama-2-7B (test) | AutoDAN-Turbo | ASR36 | 5 | 4d ago | |
| HarmBench gpt-4o-2024-08-06 (test) | AdvReasoning | ASR86 | 3 | 4d ago | |
| HarmBench gpt-3.5-turbo-0125 (test) | TransferAttack | ASR80 | 3 | 4d ago | |
| ConvAI2 (test) | BRT (e) | P Score186 | 3 | 4d ago |