Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Attack on AdvBench 50 harmful behaviors

100GPT-3.5 Turbo Jailbreak Rate

PromptAttack

-4235077Oct 2, 2024
Updated 16d ago

Evaluation Results

MethodLinks
2024.10
1001001001003624381664.25
2024.10
986474827618125860.25
2024.10
98943292844406063
2024.10
948472928618106264.75
2024.10
929484987410009479.5
2024.10
921002898622209261.75
2024.10
9288608650425454.5
2024.10
909286905684249477
2024.10
8628164228007634.5
2024.10
808268708892785276.25
2024.10
784836501220181835
2024.10
78666446812161638.25
2024.10
767862806816129060.25
2024.10
70323644406624.75
2024.10
68221440261866632.5
2024.10
6434426010043831.5
2024.10
50000400209.25
2024.10
38020400187.75
2024.10
388304020004622.75
2024.10
36000400146.75
2024.10
360064400413.5
2024.10
32220412087.5
2024.10
32462450282404230.75
2024.10
2646347034007035
2024.10
2400000003
2024.10
22000014024.75
2024.10
2000048044.5
2024.10
20620000206
2024.10
42228302605818.75
2024.10
41224360006217.25
2024.10
2068022100213
2024.10
0036000004.5