Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreaking on AdvBench (BERT Score & Harmful Scores)

4.84BERT Score

JULI

0.44081.58292.7253.8671May 17, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
4.844.763.73
2025.05
4.814.663.68
2025.05
4.762.060.5
2025.05
4.754.723.98
2025.05
4.434.283.36
2025.05
4.421.930.48
2025.05
4.394.13.02
2025.05
4.374.193.19
2025.05
4.334.573.44
2025.05
4.051.950.77
2025.05
44.222.99
2025.05
3.961.540.45
2025.05
3.952.80.81
2025.05
3.944.223.5
2025.05
3.73.082.35
2025.05
3.692.961.84
2025.05
3.654.213.13
2025.05
3.632.792.02
2025.05
3.542.870.87
2025.05
3.542.870.87
2025.05
3.543.822.62
2025.05
3.423.72.22
2025.05
3.413.131.33
2025.05
3.342.440.79
2025.05
3.241.260.6
2025.05
3.141.370.42
2025.05
3.132.772.05
2025.05
3.071.40.41
2025.05
2.983.042.14
2025.05
2.951.920.75
2025.05
2.913.122.21
2025.05
2.653.682.16
2025.05
2.633.772.25
2025.05
2.582.521.74
2025.05
2.462.381.26
2025.05
2.412.511.74
2025.05
2.392.521.38
2025.05
2.281.710.76
2025.05
2.092.091.29
2025.05
2.071.480.41
2025.05
2.071.480.41
2025.05
1.981.210.17
2025.05
1.921.040.07
2025.05
1.871.640.56
2025.05
1.821.380.35
2025.05
1.811.560.44
2025.05
1.71.911.21
2025.05
1.641.40.39
2025.05
1.641.40.39
2025.05
1.561.40.44
2025.05
1.321.210.21
2025.05
1.081.010
2025.05
0.791.040.04
2025.05
0.6410.02
2025.05
0.611.020.06