Share your thoughts, 1 month free Claude Pro on usSee more

Prompt Harmfulness Detection on AegisSafety (test)

99.5F1 Score

LLaMA Guard 3

Updated 1mo ago

Evaluation Results

Method	Links
LLaMA Guard 3 2026.05		99.5
GuardReasoner 2026.05		91.39
GuardReasoner 2026.05		91.39
ConsisGuard 2026.05		91.02
COLAGUARD 2026.05		90.58
GuardReasoner 2026.05		90.18
GuardReasoner 2026.05		90.18
WildGuard 2026.05		89.69
COLAGUARD 2026.05		89.45
WildGuard 2026.05		89.4
GuardReasoner 2026.05		89.34
GPT-4o+CoT 2026.05		88.24
ConsisGuard 2026.05		86.58
GPT-4o+CoT 2026.05		86.32
Aegis Guard Defensive 2026.05		84.8
Aegis Guard Def 2026.05		84.8
qwen3-CoT 2026.05		83.36
o1-preview 2026.05		83.15
Aegis Guard Permissive 2026.05		82.9
Aegis Guard Per 2026.05		82.9
Gemini 1.5+CoT 2026.05		82.88
o1-pre+CoT 2026.05		81.96
GPT-4o 2026.05		81.07
GPT-4+CoT 2026.05		80.52
QWQ+CoT 2026.05		80.5
QwQ-preview 2026.05		80.23
Claude 3.5+CoT 2026.05		78.62
ShieldGemma 2026.05		77.63
ShieldGemma 2026.05		77.63
MPNet-based NBF 2025.02		74.8
LLaMA Guard 2025.02		74.1
LLaMA Guard 2026.05		74.1
DistilRoBERTa-based NBF 2025.02		74
LLaMA Guard 2 2026.05		71.8
LLaMA Guard 2 2026.05		71.8
LLaMA Guard 3 2026.05		71.39
OpenAI Moderation 2025.02		31.9
Moderation 2026.05		31.9
ShieldGemma 2025.02		7.5
ShieldGemma 2026.05		7.47
ShieldGemma 2026.05		7.47