Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Misuse Detection on Misuse Categories Psychological Harm (Delusional)
Loading...
99
AUC
GAVEL
39.72
55.11
70.5
85.89
Jan 27, 2026
AUC
b-ACC
FPR
Updated 4d ago
Evaluation Results
Method
Method
Links
AUC
b-ACC
FPR
GAVEL
Category=Classifier, B...
2026.01
99
95
0
Activation Classifier
Category=Classifier, B...
2026.01
98
93
7
JBShield
Category=Inference-Tim...
2026.01
81
85
3
Perspective (Google)
Category=Moderation, B...
2026.01
77
50
18
Llama Guard 4 (Meta)
Category=Moderation, B...
2026.01
62
86
1
RepBending
Category=Fine-Tuning,...
2026.01
57
57
1
Moderator (OpenAI)
Category=Moderation, B...
2026.01
50
50
0
Circuit Breakers
Category=Fine-Tuning,...
2026.01
49
50
6
CAST
Category=Inference-Tim...
2026.01
42
47
70
Feedback
Search any
task
Search any
task