Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Prompt Classification on HarmBench Text Prompt
Loading...
98.85
F1 Score
GPT-OSS-SafeGuard-20B
62.7932
72.1541
81.515
90.8759
Dec 29, 2025
F1 Score
Updated 2d ago
Evaluation Results
Method
Method
Links
F1 Score
GPT-OSS-SafeGuard-20B
2025.12
98.85
LlamaGuard3-8B
2025.12
98.73
GPT4o-mini
2025.12
98.35
LlamaGuard4-12B
2025.12
97.44
GuardReasonerVL-7B
2025.12
96.64
GuardReasoner-8B
2025.12
95.42
LlamaGuard3-11B-Vision
2025.12
95.01
LlamaGuard2-8B
2025.12
92.62
WildGuard-7B
2025.12
92.04
ProGuard-7B
2025.12
91.6
ProGuard-3B
2025.12
89.2
Gemini2.5-Flash
2025.12
87.64
LlamaGuard-7B
2025.12
69.28
ShieldGemma-9B
2025.12
64.18
Feedback
Search any
task
Search any
task