Input Moderation

Benchmarks

Dataset Name	SOTA Method	Metric
ToxicChat (test)	Qwen3Guard-4B-Gen-loose	F1 Score82.8	42	2mo ago
AEGIS (test)	Qwen3Guard-8B-Gen-strict	F1 Score91.4	26	3mo ago
WildGuard (test)	Llama3-StreamGuard-8B	F1 Score89.5	22	3mo ago
HarmBench (test)	Qwen3Guard-4B-Gen-strict	F1 Score100	22	3mo ago
SS (test)	PolyGuard-Qwen-7B	F1 Score100	22	3mo ago
AEGIS 2.0 (test)	Llama3-StreamGuard-8B	F1 Score87.9	22	3mo ago
Input Moderation Benchmark Suite (ToxicChat, OAIMod, Aegis, Aegis2, SSTest, HarmB, WildG)	Llama3-StreamGuard-8B	Macro-average F188.2	22	3mo ago
Harmful safety datasets Average	MLPM	Average F1 Score (Input Moderation)88.33	9	1mo ago

Showing 8 of 8 rows