Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Misuse Detection on Misuse Categories Psychological Harm (Anti-LGBTQ)
Loading...
100
AUC
Moderator (OpenAI)
4.32
29.16
54
78.84
Jan 27, 2026
AUC
b-ACC
FPR
Updated 4d ago
Evaluation Results
Method
Method
Links
AUC
b-ACC
FPR
Moderator (OpenAI)
Category=Moderation, B...
2026.01
100
100
0
GAVEL
Category=Classifier, B...
2026.01
100
100
0
RepBending
Category=Fine-Tuning,...
2026.01
99
99
1
CAST
Category=Inference-Tim...
2026.01
99
91
17
Llama Guard 4 (Meta)
Category=Moderation, B...
2026.01
99
99
1
Activation Classifier
Category=Classifier, B...
2026.01
99
98
3
Circuit Breakers
Category=Fine-Tuning,...
2026.01
94
95
9
JBShield
Category=Inference-Tim...
2026.01
73
84
1
Perspective (Google)
Category=Moderation, B...
2026.01
8
62
0
Feedback
Search any
task
Search any
task