| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Misuse Categories Aggregate Summary | GAVEL | AUC99 | 9 | 4d ago | |
| Misuse Categories Automation (e-commerce) | Activation Classifier | AUC100 | 9 | 4d ago | |
| Misuse Categories Scam (Romance) | Activation Classifier | AUC100 | 9 | 4d ago | |
| Misuse Categories Scam (Tax Authority) | Llama Guard 4 (Meta) | AUC99 | 9 | 4d ago | |
| Misuse Categories Scam (Racism) | GAVEL | AUC1 | 9 | 4d ago | |
| Misuse Categories Scam (Elections) | GAVEL | AUC0.99 | 9 | 4d ago | |
| Misuse Categories Psychological Harm (Anti-LGBTQ) | AUC100 | 9 | 4d ago | ||
| Misuse Categories Psychological Harm (Delusional) | GAVEL | AUC99 | 9 | 4d ago | |
| Misuse Categories Cybercrime (SQL Injection) | Activation Classifier | AUC99 | 9 | 4d ago | |
| Misuse Categories Cybercrime (Phishing) | RepBending | AUC0.99 | 9 | 4d ago | |
| ToxiGen Homophobia (external) | GAVEL | TPR98 | 1 | 4d ago | |
| ToxiGen Ethnoracial (external) | GAVEL | TPR91 | 1 | 4d ago | |
| Reasoning Shield Political Risk (external) | GAVEL | TPR97 | 1 | 4d ago | |
| PKU Phishing Guidance (external) | GAVEL | TPR76 | 1 | 4d ago |