Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Classification on XSTestResponse
Loading...
0.96
F1 Score
AprielGuard
0.8248
0.8599
0.895
0.9301
Dec 23, 2025
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
AprielGuard
Size=8B, Reasoning=true
2025.12
0.96
AprielGuard
Size=8B, Reasoning=false
2025.12
0.95
Qwen3Guard
Size=8B, Mode=loose, R...
2025.12
0.94
Qwen3Guard
Size=8B, Mode=strict,...
2025.12
0.92
Llama Guard 2
Reasoning=false
2025.12
0.91
Llama Guard 3
Reasoning=false
2025.12
0.9
Llama Guard 4
Reasoning=false
2025.12
0.9
IBM Granite Guardian 3.3
Size=8B, Reasoning=false
2025.12
0.9
IBM Granite Guardian 3.2
Size=5B, Reasoning=false
2025.12
0.89
gpt-oss-safeguard
Size=20B, Reasoning=true
2025.12
0.89
IBM Granite Guardian 3.1
Size=2B, Reasoning=false
2025.12
0.88
IBM Granite Guardian 3.3
Size=8B, Reasoning=true
2025.12
0.86
IBM Granite Guardian 3.2
Size=3B, Reasoning=false
2025.12
0.85
ShieldGemma
Size=9B, Reasoning=false
2025.12
0.83
Feedback
Search any
task
Search any
task