Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreaking on HarmBench and StrongReject 200 prompts (held-out)
Loading...
80
Success Rate Fraction
PGD + Adaptive Probe-based Steering
-2.16
19.17
40.5
61.83
May 19, 2026
Success Rate Fraction
HB Score
Success Rate
Updated 13d ago
Evaluation Results
Method
Method
Links
Success Rate Fraction
HB Score
Success Rate
PGD + Adaptive Probe-based Steering
Target Model=GLM-4.6V-...
2026.05
80
97
98
Adaptive Probe-based Steering
Target Model=GLM-4.6V-...
2026.05
71
85
97
Adaptive Probe-based Steering
Target Model=Qwen3-4B-...
2026.05
67
94
97
Adaptive Probe-based Steering
Target Model=Llava-CB
2026.05
66
88
96
PGD + Adaptive Probe-based Steering
Target Model=Llava-CB
2026.05
66
81
92
No Attack
Target Model=GLM-4.6V-...
2026.05
26
28
29
No Attack
Target Model=Qwen3-4B-...
2026.05
19
3
4
No Attack
Target Model=Llava-CB
2026.05
1
1
0
Feedback
Search any
task
Search any
task