Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildGuardMix

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety ClassificationWildGuardMix (test)
F1 (Unsafe)75.83
27
Safety EvaluationWildGuardMix
Safety Score0.8974
22
Safety ClassificationWildguardmix
F1 Score76
15
Prompt Safety DetectionWildGuardMix (train)
AUROC0.8971
15
Prompt Safety DetectionWildGuardMix (test)
AUROC0.8882
15
Safety ClassificationWildGuardMix-p (test)
F1 Score93.2
9
Safety RoutingWildGuardMix
Routing F154.34
5
Safety RoutingWildGuardMix-p
Routing F10.5054
5
Prompt-Response Safety RoutingWildGuardMix
Routing F161.41
5
Prompt-only Safety RoutingWildGuardMix-p
Routing F1 Score61.28
5
Safety AlignmentWildGuardMix
Win Rate55
5
Explainability classificationWildGuardMix human-annotated (test)
F1 Score60.69
3
Response GenerationWildGuardMix
Win Count61
3
Showing 13 of 13 rows