Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Attack on Harmful behaviors jailbreak evaluation set
Loading...
100
ASR (Harmful Behaviors)
Baseline
-4
23
50
77
May 27, 2026
ASR (Harmful Behaviors)
Updated 6d ago
Evaluation Results
Method
Method
Links
ASR (Harmful Behaviors)
Baseline
Target Model=Qwen3-8B
2026.05
100
Baseline
Target Model=Qwen3-1.7B
2026.05
100
Baseline
Target Model=Phi-4-rea...
2026.05
97
Baseline
Target Model=GPT-OSS-20B
2026.05
94
Baseline
Target Model=Gemma-4-E...
2026.05
88
BCT
Target Model=Gemma-4-E...
2026.05
57
BCT
Target Model=Qwen3-1.7B
2026.05
36
ACT
Target Model=Gemma-4-E...
2026.05
33
BCT
Target Model=Qwen3-8B
2026.05
30
BCT
Target Model=Phi-4-rea...
2026.05
15.3
ACT
Target Model=Qwen3-1.7B
2026.05
14
ACT
Target Model=GPT-OSS-20B
2026.05
1
BCT
Target Model=GPT-OSS-20B
2026.05
1
ACT
Target Model=Qwen3-8B
2026.05
0
ACT
Target Model=Phi-4-rea...
2026.05
0
Feedback
Search any
task
Search any
task