| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AEGIS (test) | Qwen3Guard-8B-Gen-strict | F1 Score91.4 | 26 | 5d ago | |
| WildGuard (test) | Llama3-StreamGuard-8B | F1 Score89.5 | 22 | 12d ago | |
| HarmBench (test) | Qwen3Guard-4B-Gen-strict | F1 Score100 | 22 | 12d ago | |
| SS (test) | PolyGuard-Qwen-7B | F1 Score100 | 22 | 12d ago | |
| AEGIS 2.0 (test) | Llama3-StreamGuard-8B | F1 Score87.9 | 22 | 12d ago | |
| ToxicChat (test) | Qwen3Guard-4B-Gen-loose | F1 Score82.8 | 22 | 12d ago | |
| Input Moderation Benchmark Suite (ToxicChat, OAIMod, Aegis, Aegis2, SSTest, HarmB, WildG) | Llama3-StreamGuard-8B | Macro-average F188.2 | 22 | 12d ago |