Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Misuse Detection on Misuse Categories Scam (Tax Authority)
Loading...
99
AUC
Llama Guard 4 (Meta)
-2.92
23.54
50
76.46
Jan 27, 2026
AUC
Balanced Accuracy
FPR
Updated 4d ago
Evaluation Results
Method
Method
Links
AUC
Balanced Accuracy
FPR
Llama Guard 4 (Meta)
Category=Moderation, B...
2026.01
99
99
0
Activation Classifier
Category=Classifier, B...
2026.01
99
99
1
GAVEL
Category=Classifier, B...
2026.01
99
92
2
RepBending
Category=Fine-Tuning,...
2026.01
98
99
2
Circuit Breakers
Category=Fine-Tuning,...
2026.01
67
68
1
Moderator (OpenAI)
Category=Moderation, B...
2026.01
50
50
0
JBShield
Category=Inference-Tim...
2026.01
14
56
0
CAST
Category=Inference-Tim...
2026.01
8
24
91
Perspective (Google)
Category=Moderation, B...
2026.01
1
49
0
Feedback
Search any
task
Search any
task