Share your thoughts, 1 month free Claude Pro on usSee more

Response Harmfulness Detection on BeaverTails

89.9F1 Score

BeaverDam 7B

Updated 1mo ago

Evaluation Results

Method
BeaverDam 7B 2026.02	89.9	-
BeaverDam 2026.05	89.9	-
ConsisGuard 2026.05	88.3	-
GuardReasoner 8B 2026.02	87.6	-
GuardReasoner 2026.05	87.6	-
GuardReasoner 2026.05	87.6	-
ConsisGuard 2026.05	87.44	-
GuardReasoner 2026.05	86.72	-
GuardReasoner 2026.05	86.72	-
MD-Judge 7B 2026.02	86.7	-
MD-Judge 2026.05	86.7	-
MD-Judge 2026.05	86.7	-
COLAGUARD 2026.05	86.55	-
Qwen3Guard Gen 8B 2026.02	86.48	-
COLAGUARD 2026.05	86.29	-
GuardReasoner-Omni 4B 2026.02	86.04	-
GuardReasoner-Omni 2B 2026.02	85.89	-
GuardReasoner 2026.05	85.84	-
Qwen3Guard Gen 8B 2026.02	85.35	-
GuardReasoner-VL 3B 2026.02	85.19	-
GuardReasoner-VL 7B 2026.02	84.99	-
WildGuard 2026.05	84.66	-
WildGuard 7B 2026.02	84.4	-
WildGuard 2026.05	84.4	-
GPT-4o+CoT 2026.05	83.41	-
Gemini 1.5+CoT 2026.05	83.27	-
Claude 3.5+CoT 2026.05	83.2	-
GPT-4o+CoT 2026.05	82.26	-
o1-pre+CoT 2026.05	80.89	-
GPT-4+CoT 2026.05	80.44	-
o1-preview 2026.05	79.96	-
PolyGuard-Qwen 7B 2026.02	79.39	-
qwen3-CoT 2026.05	78.7	-
GPT-4o 2026.05	78.63	-
QWQ+CoT 2026.05	77.89	-
QwQ-preview 2026.05	77.26	-
HarmBench LLaMA 13B 2026.02	77.1	-
HarmBench LLaMA 2026.05	77.1	-
HarmBench Mistral 7B 2026.02	75.2	-
HarmBench Mistral 2026.05	75.2	-
Aegis Guard Defensive 7B 2026.02	74.7	-
Aegis Guard Defensive 2026.05	74.7	-
Aegis Guard Def 2026.05	74.7	-
Aegis Guard Permissive 7B 2026.02	73.8	-
Aegis Guard Permissive 2026.05	73.8	-
Aegis Guard Per 2026.05	73.8	-
LLaMA Guard 2 2026.05	71.8	-
LLaMA Guard 2 2026.05	71.8	-
LLaMA Guard 4 12B 2026.02	69.51	-
LLaMA Guard 3 8B 2026.02	67.84	-
LLaMA Guard 3 2026.05	67.84	-
LLaMA Guard 3 2026.05	67.84	-
LLaMA Guard 2026.05	67.1	-
ShieldGemma 9B 2026.02	63.61	-
ShieldGemma 2026.05	63.61	-
ShieldGemma 2026.05	63.61	-
ShieldGemma 2026.05	30.97	-
ShieldGemma 2026.05	30.97	-
Moderation 2026.05	15.7	-
SIREN 2026.04	-	83.5
Qwen3Guard 2026.04	-	77.1
SIREN 2026.04	-	83.7
LlamaGuard3 2026.04	-	70
SIREN 2026.04	-	84.3
Qwen3Guard 2026.04	-	80.1
SIREN 2026.04	-	83.8
LlamaGuard3 2026.04	-	68.8