Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Jailbreak Detection on Wildjailbreak
Loading...
96
F1 Score
Apriel Guard
-2.8
22.85
48.5
74.15
Dec 23, 2025
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
Apriel Guard
Model Size=8B, Reasoni...
2025.12
96
Apriel Guard
Model Size=8B, Reasoni...
2025.12
96
IBM Granite Guardian
Version=3.1, Model Siz...
2025.12
95
IBM Granite Guardian
Version=3.2, Model Siz...
2025.12
93
IBM Granite Guardian
Version=3.2, Model Siz...
2025.12
89
IBM Granite Guardian
Version=3.3, Model Siz...
2025.12
89
IBM Granite Guardian
Version=3.3, Model Siz...
2025.12
74
Llama Guard
Version=4, Reasoning=f...
2025.12
70
Llama Guard
Version=3, Reasoning=f...
2025.12
68
Llama Prompt Guard 2
Model Size=0.086B, Rea...
2025.12
60
ShieldGemma
Model Size=9B, Reasoni...
2025.12
52
Llama Guard
Version=2, Reasoning=f...
2025.12
50
gpt-oss-safeguard
Model Size=20B, Reason...
2025.12
18
Qwen3Guard
Model Size=8B, Constra...
2025.12
1
Qwen3Guard
Model Size=8B, Constra...
2025.12
1
Feedback
Search any
task
Search any
task