Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WildGuard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Response Harmfulness ClassificationWildGuard (test)
F1 (Total)79.48
30
Safety EvaluationWildguard (test)
Wildguard Test Score0.08
27
Prompt ClassificationWildGuard Text Prompt
F1 Score90.46
14
Refusal DetectionWILDGUARD (test)
F1 (Harmful)94
14
Text-based safety moderationWildGuard
F1 Score78.6
12
Prompt Harmfulness ClassificationWILDGUARD (test)
F1 (Total)88.9
12
Jailbreak AttackWildGuard (test)
ASR82.64
8
Safety ModerationWildGuard Prompt
F1 Score89.5
7
Audio Safety ModerationWildGuard-TTS
F1 Score88.4
7
SCAV-Embedding Attack DefenseWildguard (test)
ASR28.84
4
Showing 10 of 10 rows