Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WildGuardMix

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety ClassificationWildGuardMix (test)
F1 Score95.3
47
Computational Complexity AnalysisWildGuardMix 1.0 (test)
FLOPs (MFLOPs)0.004
40
Safety MonitoringWildGuardMix (test)
Accuracy89.9
40
LLM ModerationWildGuardMix (test)
ASR14.59
28
Safety EvaluationWildGuardMix
Safety Score0.8974
22
Harmful prompt classificationWildGuardMix (val)
F1 Score98.34
20
Safety ClassificationWildguardmix
F1 Score76
15
Prompt Safety DetectionWildGuardMix (train)
AUROC0.8971
15
Prompt Safety DetectionWildGuardMix (test)
AUROC0.8882
15
Post-generation InferenceWildGuardMix LLaDA-2.0-mini (test)
Inference Time0.34
10
Post-generation InferenceWildGuardMix LLaDA-1.5 (test)
Inference Time0.36
10
Post-generation InferenceWildGuardMix LLaDA-8B-Instruct (test)
Inference Time0.31
10
Post-generation InferenceWildGuardMix LLaDA-8B-Base (test)
Inference Time0.57
10
Safety ClassificationWildGuardMix-p (test)
F1 Score93.2
9
Safety RoutingWildGuardMix
Routing F154.34
5
Safety RoutingWildGuardMix-p
Routing F10.5054
5
Prompt-Response Safety RoutingWildGuardMix
Routing F161.41
5
Prompt-only Safety RoutingWildGuardMix-p
Routing F1 Score61.28
5
Safety AlignmentWildGuardMix
Win Rate55
5
Explainability classificationWildGuardMix human-annotated (test)
F1 Score60.69
3
Response GenerationWildGuardMix
Win Count61
3
Showing 21 of 21 rows