Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Prompt classification on XSTest
Loading...
94.8
F1 Score
WildGuard
82.32
85.56
88.8
92.04
Jan 22, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
WildGuard
Model Size=7B
2026.01
94.8
YuFeng-XGuard
Model Size=8B
2026.01
94.4
PolyGuard
Model Size=7B
2026.01
92.2
YuFeng-XGuard
Model Size=0.6B
2026.01
91.6
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
90.8
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
89.9
GPT-OSS-SafeGuard
Model Size=20B
2026.01
89.9
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
89.3
Llama3Guard
Model Size=8B
2026.01
88.4
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
87.7
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
85.7
NemotronReasoning
Model Size=4B
2026.01
84.8
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
84.6
NemotronGuardV2
Model Size=8B
2026.01
84.1
Llama4Guard
Model Size=12B
2026.01
83.3
ShieldGemma
Model Size=9B
2026.01
82.8
Feedback
Search any
task
Search any
task