Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Defense on jailbreak defense dataset

0ASR

Gradient-guided Token Masking

-3.890822.372148.63574.8979May 24, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
0--------
2026.05
0--------
2026.05
0.91--------
2026.05
0.91--------
2026.05
0.91--------
2026.05
0.91--------
2026.05
0.91--------
2026.05
1.82--------
2026.05
6.36--------
2026.05
9.09--------
2026.05
10--------
2026.05
10.9--------
2026.05
13.63--------
2026.05
13.64--------
2026.05
15.45--------
2026.05
17.27--------
2026.05
20--------
2026.05
20.91--------
2026.05
31.81--------
2026.05
68.18--------
2026.05
78.18--------
2026.05
82.73--------
2026.05
94.55--------
2026.05
97.27--------
2026.04
-924805875452849.43
2026.04
-10097010086936377
2026.04
-10076619888906782.86
2026.04
-9251679981969683.14
2026.04
-100965997899810091.29
2026.04
-100100100100949910099
2026.04
-9690479886917383
2026.04
-10010093100951009998.14
2026.04
-2521227119420.57
2026.04
-91426905417347.14
2026.04
-77816710079887071.71
2026.04
-25273396919910067.29
2026.04
-100100979795959897.43
2026.04
-100100100100999910099.71
2026.04
-9910090100971009997.85
2026.04
-100100981009710010099.28