Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Detection on DrAttack using ASR and PGR

3ASR

SelfDefend (Intent)

2.724.616.58.39Apr 1, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.04
320
2026.04
392
2026.04
317
2026.04
410
2026.04
69
2026.04
654
2026.04
695
2026.04
850
2026.04
836
2026.04
999
2026.04
10-
2026.04
10100
2026.04
1042
2026.04
1094
2026.04
1084