Share your thoughts, 1 month free Claude Pro on usSee more

Harmful Content Detection on Standard Harmful Content Datasets Evasion Attack

96Phishing

GAVEL

Updated 1mo ago

Evaluation Results

Method	Links
GAVEL 2026.01		96	95	97	100	100	98	71	97	100
GPT4 2026.01		91	95	56	66	85	87	91	92	97
GPT4 2026.01		86	91	29	80	54	94	100	91	84