Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Robustness on StrongReject (Detailed Attack Metrics)

67Direct Attack Rate

No Defense

-2.6815.4133.551.59Oct 24, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
67865560806780-56437667
2025.10
28794453396073-59176351
2025.10
2057363229067-6586338
2025.10
6411622201511-2775522
2025.10
34119571527-44254623
2025.10
27131975743-66157738
2025.10
235151471719-25163719
2025.10
2461963041-226819
2025.10
0308301728-2243415
2025.10
0260532923-237112
2025.10
0501645137-2756220
2025.10
021700041-1444313
2025.10
039700038-4903717
2025.10
024200020-110137
2025.10
011200107-9685
2025.10
0113011210-121287
2025.10
06019722457-3594326
2025.10
058012122465-3224123
2025.10
014130003-607111
2025.10
0351000513-279010
2025.10
0341700016-1124212
2025.10
0412332600-063614
2025.10
0101051236-51299
2025.10
0102001125-4018912
2025.10
0602020202454-42113930
2025.10
056027302363-5082328
2025.10
0331630531-3165718
2025.10
0532011050-30307126
2025.10
01559181328-93711
2025.10
020426271326-261114