Harmful Content Detection on Standard Harmful Content Datasets Misdirection Attack

97Phishing

GAVEL

Updated 1mo ago

Evaluation Results

Method	Links
GAVEL 2026.01		97	89	87	100	99	99	86	100	89
GPT4 2026.01		55	49	12	35	15	16	0	24	1