Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Attack on SafeBench (HF and ASR)

1.7HF

FigStep-Pro

1.4882.9194.355.781Mar 8, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
1.74.9
2026.03
217.7
2026.03
2.50.9
2026.03
2.51.1
2026.03
2.61.1
2026.03
2.82
2026.03
2.96.3
2026.03
31.7
2026.03
3.211.4
2026.03
3.751.4
2026.03
3.85.4
2026.03
3.87.1
2026.03
419.7
2026.03
416.3
2026.03
4.113.7
2026.03
4.324.6
2026.03
4.335.7
2026.03
4.420.6
2026.03
4.520.9
2026.03
4.634
2026.03
4.636.3
2026.03
4.628.6
2026.03
4.726.9
2026.03
4.746.6
2026.03
4.825.4
2026.03
4.934.4
2026.03
5.238.9
2026.03
5.452.9
2026.03
5.441.1
2026.03
5.460.6
2026.03
5.674
2026.03
5.840.6
2026.03
5.936
2026.03
5.965.1
2026.03
651.4
2026.03
654
2026.03
6.155.7
2026.03
6.177.7
2026.03
6.180.6
2026.03
6.156.3
2026.03
6.170.9
2026.03
6.356
2026.03
6.360
2026.03
6.374.6
2026.03
6.456
2026.03
6.467.7
2026.03
6.450.9
2026.03
6.452.3
2026.03
6.454
2026.03
6.559.7
2026.03
6.854
2026.03
763.1
2026.03
756
2026.03
756.9