Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Classification on ToxicChat (out-of-distribution)
Loading...
72.88
F1 Score
Multi-head self-attn
46.0688
53.0294
59.99
66.9506
Jan 19, 2026
F1 Score
AUPRC
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
AUPRC
Multi-head self-attn
Training dataset=WildG...
2026.01
72.88
0.798
WildGuard
Added Params (M)=7000,...
2026.01
70.8
-
Aegis-Guard-D
Added Params (M)=8000,...
2026.01
70
-
GPT-4
Added Params (M)=17000...
2026.01
68.3
-
ShieldHead (Gemma2-27B)
Added Params (M)=306,...
2026.01
67.7
-
Scoring attention
Training dataset=WildG...
2026.01
64.81
0.706
ShieldHead (Llama3.1-8B)
Added Params (M)=91, E...
2026.01
64.3
-
Llama Guard
Added Params (M)=7000,...
2026.01
61.6
0.626
OpenAI Moderation
Extra LM call=true
2026.01
61.4
0.631
Direct pooling
Training dataset=WildG...
2026.01
53.33
0.565
Llama-Guard2
Added Params (M)=8000,...
2026.01
47.1
-
Feedback
Search any
task
Search any
task