Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

JBB-Behaviors

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseJBB-Behaviors
ASR0
121
JailbreakJBB-Behaviors utilitarian dilemmas (test)
Jailbreak Success Rate87
72
Jailbreak AttackJBB-Behaviors
Rule-Judge Score100
56
JailbreakingJBB Behaviors
ASR100
35
Jailbreak AttackJBB Behaviors
ASR100
35
JailbreakingJBB-Behaviors (test)
ASR (GPT-4o)99
27
Jailbreak RobustnessJBB-Behaviors (test)
ASR0
24
LLM JailbreakingJBB-Behaviors Scenario J3
Hypervolume0.707
21
LLM JailbreakingJBB-Behaviors Scenario J2
Hypervolume0.691
21
LLM JailbreakingJBB-Behaviors Scenario J1
Hypervolume59.1
21
Robustness against priming vulnerabilityJBB-Behaviors (test)
ASR (Guardrail Model)0
20
Jailbreak Attack RobustnessJBB-Behaviors
ASR (PAIR)10
18
Jailbreak RobustnessJBB-Behaviors
ASR (PAIR, Guardrail Model)0.3
18
JailbreakJBB-Behaviors
ASR (GPT-4o)99.2
12
Safety EvaluationJBB-Behaviors
Safety Score99.3
9
Safety EvaluationJBB-Behaviors
Unsafe Interaction Rate0
3
Showing 16 of 16 rows