Share your thoughts, 1 month free Claude Pro on usSee more

Prompt Classification on HarmBench Text Prompt

98.85F1 Score

GPT-OSS-SafeGuard-20B

Updated 4mo ago

Evaluation Results

Method	Links
GPT-OSS-SafeGuard-20B 2025.12		98.85
LlamaGuard3-8B 2025.12		98.73
GPT4o-mini 2025.12		98.35
LlamaGuard4-12B 2025.12		97.44
GuardReasonerVL-7B 2025.12		96.64
GuardReasoner-8B 2025.12		95.42
LlamaGuard3-11B-Vision 2025.12		95.01
LlamaGuard2-8B 2025.12		92.62
WildGuard-7B 2025.12		92.04
ProGuard-7B 2025.12		91.6
ProGuard-3B 2025.12		89.2
Gemini2.5-Flash 2025.12		87.64
LlamaGuard-7B 2025.12		69.28
ShieldGemma-9B 2025.12		64.18