Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Prompt classification on SorryB
Loading...
97.2
F1 Score
PolyGuard
62.568
71.559
80.55
89.541
Jan 22, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
PolyGuard
Model Size=7B
2026.01
97.2
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
95.1
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
94.3
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
93.6
YuFeng-XGuard
Model Size=8B
2026.01
93.2
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
91.2
YuFeng-XGuard
Model Size=0.6B
2026.01
91.2
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
90.4
WildGuard
Model Size=7B
2026.01
90
NemotronReasoning
Model Size=4B
2026.01
88.8
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
88.4
GPT-OSS-SafeGuard
Model Size=20B
2026.01
88
Llama3Guard
Model Size=8B
2026.01
87.1
NemotronGuardV2
Model Size=8B
2026.01
78.4
Llama4Guard
Model Size=12B
2026.01
73.2
ShieldGemma
Model Size=9B
2026.01
63.9
Feedback
Search any
task
Search any
task