Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Classification on XSTest
Loading...
94
F1 Score
AprielGuard
65.9616
73.2408
80.52
87.7992
Dec 2, 2025
Dec 5, 2025
Dec 9, 2025
Dec 12, 2025
Dec 16, 2025
Dec 19, 2025
Dec 23, 2025
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
AprielGuard
Size=8B, Reasoning=false
2025.12
94
Qwen3Guard
Size=8B, Mode=strict,...
2025.12
91
AprielGuard
Size=8B, Reasoning=true
2025.12
91
Qwen3Guard
Size=8B, Mode=loose, R...
2025.12
90
gpt-oss-safeguard
Size=20B, Reasoning=true
2025.12
90
Llama Guard 2
Reasoning=false
2025.12
89
Llama Guard 3
Reasoning=false
2025.12
88
IBM Granite Guardian 3.3
Size=8B, Reasoning=true
2025.12
87
IBM Granite Guardian 3.3
Size=8B, Reasoning=false
2025.12
86
IBM Granite Guardian 3.2
Size=5B, Reasoning=false
2025.12
85
Llama Guard 4
Reasoning=false
2025.12
84
IBM Granite Guardian 3.1
Size=2B, Reasoning=false
2025.12
83
ShieldGemma
Size=9B, Reasoning=false
2025.12
82
IBM Granite Guardian 3.2
Size=3B, Reasoning=false
2025.12
81
CREST-LARGE
Model Variant=Large, B...
2025.12
69.83
CREST-BASE
Model Variant=Base, Ba...
2025.12
67.04
Feedback
Search any
task
Search any
task