Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Harmful Content Detection on Standard Harmful Content Datasets Misdirection Attack
Loading...
97
Phishing
GAVEL
53.32
64.66
76
87.34
Jan 27, 2026
Phishing
SQL Injection
Delusional
Anti-LGBTQ
Elections
Racism
Tax Authority
Romance
E-commerce
Updated 1mo ago
Evaluation Results
Method
Method
Links
Phishing
SQL Injection
Delusional
Anti-LGBTQ
Elections
Racism
Tax Authority
Romance
E-commerce
GAVEL
2026.01
97
89
87
100
99
99
86
100
89
GPT4
2026.01
55
49
12
35
15
16
0
24
1
Feedback
Search any
task
Search any
task