Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prompt Classification on HarmBench Text Prompt

98.85F1 Score

GPT-OSS-SafeGuard-20B

62.793272.154181.51590.8759Dec 29, 2025
Updated 2d ago

Evaluation Results

MethodLinks
98.85
98.73
2025.12
98.35
97.44
96.64
95.42
95.01
92.62
92.04
91.6
89.2
87.64
69.28
64.18