ProGuard

Benchmarks

Task Name	Dataset Name	SOTA Result
Content moderation	ProGuard	Score (%)95.4	12
Unsafe content categorization	ProGuard Text	Accuracy76.96	9
Unsafe content categorization	ProGuard Text-Image	Accuracy0.6997	6
Unsafe content categorization	ProGuard Image	Accuracy76.02	5
OOD safety category inference (Stage 2)	ProGuard Text-Image	Mean Reward26.86	4
Out-of-Taxonomy Risk Detection	ProGuard Image	F1 Score57.59	4
Out-of-Taxonomy Risk Detection	ProGuard Text-Image	F1 Score (%)60.25	4
Out-of-Taxonomy Risk Detection	ProGuard Text	F1 Score56.94	4
OOD safety category inference (Stage 2)	ProGuard Image	Mean Reward25.95	4

Showing 9 of 9 rows