Share your thoughts, 1 month free Claude Pro on usSee more

Harmful Prompt Detection on AdvBench & XSTest (held-out evaluation sets)

0.9642AUROC (Harmful/Normal)

LatentBiopsy

Updated 3mo ago

Evaluation Results

Method	Links
LatentBiopsy 2026.03		0.9642	1	0.9373	1	0.9758	0.384	92.8
LatentBiopsy 2026.03		0.9585	1	0.9373	1	0.972	0.149	90.2
LatentBiopsy 2026.03		0.9517	1	0.9165	1	0.9674	0.427	89.9
LatentBiopsy 2026.03		0.9497	1	0.9117	1	0.9661	0.434	89.9
LatentBiopsy 2026.03		0.942	1	0.9129	1	0.9609	0.219	87.5
LatentBiopsy 2026.03		0.9374	1	0.8978	1	0.9577	0.179	88.2