Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WildGuardMix

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety ClassificationWildGuardMix (test)
F1 (Unsafe)75.83
27
Safety ClassificationWildguardmix
F1 Score76
15
Prompt Safety DetectionWildGuardMix (train)
AUROC0.8971
15
Prompt Safety DetectionWildGuardMix (test)
AUROC0.8882
15
Safety ClassificationWildGuardMix-p (test)
F1 Score93.2
9
Safety RoutingWildGuardMix
Routing F154.34
5
Safety RoutingWildGuardMix-p
Routing F10.5054
5
Prompt-Response Safety RoutingWildGuardMix
Routing F161.41
5
Prompt-only Safety RoutingWildGuardMix-p
Routing F1 Score61.28
5
Safety AlignmentWildGuardMix
Win Rate55
5
Explainability classificationWildGuardMix human-annotated (test)
F1 Score60.69
3
Response GenerationWildGuardMix
Win Count61
3
Showing 12 of 12 rows