Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Misuse Detection on Misuse Categories Scam (Racism)
Loading...
1
AUC
GAVEL
0.6776
0.7613
0.845
0.9287
Jan 27, 2026
AUC
Balanced Accuracy
FPR
Updated 4d ago
Evaluation Results
Method
Method
Links
AUC
Balanced Accuracy
FPR
GAVEL
Category=Classifier, B...
2026.01
1
0.99
0
CAST
Category=Inference-Tim...
2026.01
0.99
0.98
0.04
Moderator (OpenAI)
Category=Moderation, B...
2026.01
0.99
0.99
0
Activation Classifier
Category=Classifier, B...
2026.01
0.98
0.95
0.02
RepBending
Category=Fine-Tuning,...
2026.01
0.96
0.96
0.07
Llama Guard 4 (Meta)
Category=Moderation, B...
2026.01
0.95
0.94
0.07
Perspective (Google)
Category=Moderation, B...
2026.01
0.89
0.62
0.01
Circuit Breakers
Category=Fine-Tuning,...
2026.01
0.87
0.88
0.23
JBShield
Category=Inference-Tim...
2026.01
0.69
0.81
0
Feedback
Search any
task
Search any
task