| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| EXPGUARD (test) | EXPGUARD | Financial Score96.7 | 40 | 1mo ago | |
| BeaverTails V Text-Image Response | Qwen3Guard-8B-Gen | F1 Score84.8 | 39 | 1mo ago | |
| Aegis Text Response 2.0 | NemoGuard-8B | F1 Score86.3 | 32 | 1mo ago | |
| Public Safety Benchmarks Response Suite | BeaverDam | BeaverT Score89.9 | 16 | 1mo ago | |
| XSTest Text Response | GuardReasoner-8B | F1 Score98.43 | 16 | 1mo ago | |
| Wild Guard Text Response | DynaGuard-8B | F1 Score93.17 | 16 | 1mo ago | |
| Generic Response Classification Suite (Aegis2.0, Beavertails, SEval, SafeRLHF, Think, WildG, XSTest) | Qwen3Guard-Gen-4B | Aegis2.086.5 | 16 | 1mo ago | |
| SEA-SafeguardBench CG Cultural | SEA-Guard | AUPRC (English)75.4 | 16 | 1mo ago | |
| SafeQA English | Qwen3Guard-Gen 8B | AUPRC97.7 | 9 | 1mo ago | |
| SEA-SafeguardBench | Qwen3Guard-Gen 8B | AUPRC89.7 | 9 | 1mo ago | |
| SEA-SafeguardBench English | LlamaGuard-3 8B | AUPRC92.1 | 9 | 1mo ago |