Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prompt Classification on HarmBench Text Prompt

98.85F1 Score

GPT-OSS-SafeGuard-20B

62.793272.154181.51590.8759Dec 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
98.85
98.73
2025.12
98.35
97.44
96.64
95.42
95.01
92.62
92.04
91.6
89.2
87.64
69.28
64.18