Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Prompt classification on Aegis
Loading...
89.6
F1 Score
NemotronReasoning
71.504
76.202
80.9
85.598
Jan 22, 2026
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
NemotronReasoning
Model Size=4B
2026.01
89.6
PolyGuard
Model Size=7B
2026.01
89.6
YuFeng-XGuard
Model Size=8B
2026.01
89.6
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
88.7
YuFeng-XGuard
Model Size=0.6B
2026.01
88.3
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
87.9
WildGuard
Model Size=7B
2026.01
87.6
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
87.4
GPT-OSS-SafeGuard
Model Size=20B
2026.01
84.3
NemotronGuardV2
Model Size=8B
2026.01
82.2
Qwen3Guard-Gen
Model Size=0.6B, Evalu...
2026.01
82.1
Qwen3Guard-Gen
Model Size=8B, Evaluat...
2026.01
81.8
Qwen3Guard-Gen
Model Size=4B, Evaluat...
2026.01
81.2
ShieldGemma
Model Size=9B
2026.01
79.8
Llama3Guard
Model Size=8B
2026.01
77.8
Llama4Guard
Model Size=12B
2026.01
72.2
Feedback
Search any
task
Search any
task