Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

JBB

Benchmarks

Task NameDataset NameSOTA ResultTrend
JailbreakJBB
Jailbreak Rate0
70
Safety PerformanceJBB
Refusal Score (CR)53
35
Safety EvaluationJBB
Safe@159.67
18
Jailbreak AttackJBB-Behaviors
S2C (Ours)100
16
Jailbreak AttackJBB Qwen3-4B
Loss0.149
13
Jailbreak AttackJBB
Llama2-7B ASR91.7
12
Jailbreak Attack EvaluationJBB sampled harmful behaviors
PAIR Success Rate100
12
Jailbreak AttackJBB Gemma3-4B
Loss0.348
8
Jailbreak AttackJBB Llama2-7B
Loss0.106
8
Jailbreak AttackJBB Llama3.1-8B
Loss0.466
7
Jailbreak AttackJBB
ASR (Paper)97
7
Safety EvaluationJBB
Score93
6
Multi-label Safety Categorizationjbb behaviors category
Macro Accuracy59.37
4
Multi-label Safety Categorizationjbb behaviors behavior
Macro Accuracy72.17
4
Robustness to Jailbreak AttacksJBB Paraphrased
Harmful Reasoning Ratio16.1
3
Jailbreak Attack EvaluationJBB
Attack Success Count1
2
Transfer Jailbreak AttackJBB Sonnet-4
ASR21.5
2
Transfer Jailbreak AttackJBB GPT-5-mini
Attack Success Rate (ASR)60
2
Transfer Jailbreak AttackJBB Target: Gemini-3-flash
ASR5.6
2
Showing 19 of 19 rows