Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Prompt Detection on AdvBench & XSTest (held-out evaluation sets)

0.9642AUROC (Harmful/Normal)

LatentBiopsy

0.9363280.9435640.95080.958036Mar 28, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.03
0.964210.937310.97580.38492.8
2026.03
0.958510.937310.9720.14990.2
2026.03
0.951710.916510.96740.42789.9
2026.03
0.949710.911710.96610.43489.9
2026.03
0.94210.912910.96090.21987.5
2026.03
0.937410.897810.95770.17988.2