Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Harmful Content Detection on Standard Harmful Content Datasets (Goal Hijacking Attack)
Loading...
96
Phishing
GAVEL
53.36
64.43
75.5
86.57
Jan 27, 2026
Phishing
SQL Injection
Delusional
Anti-LGBTQ
Elections
Racism
Tax Authority
Romance
E-commerce
Updated 4d ago
Evaluation Results
Method
Method
Links
Phishing
SQL Injection
Delusional
Anti-LGBTQ
Elections
Racism
Tax Authority
Romance
E-commerce
GAVEL
2026.01
96
89
87
100
99
99
86
100
90
GPT4
2026.01
55
70
49
90
63
91
28
48
12
Feedback
Search any
task
Search any
task