Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

JailbreakV

Benchmarks

Task NameDataset NameSOTA ResultTrend
Inference LatencyJailbreakV
Latency (s)2.81
25
Text-Based Jailbreak AttackJailbreakV-28K (test)
ASR (None-Template)75.23
25
Safety EvaluationJailbreakV-28K v1 (test)
ASR (Noise-T)6.63
18
Safety EvaluationJailBreakV
ASR6.55
18
Abnormal Behavior DetectionJailBreakV (test)
Accuracy100
17
Malicious Prompt DetectionJailbreakV_28K Text-based (test)
FNR0
16
Malicious Prompt DetectionJailbreakV_28K Image-based (test)
FNR0.19
16
Image-based JailbreakJailbreakV_28K IND
ASR0.19
16
Response SafetyJailBreakV-28K (avg)
JBV-R Score0.975
15
Jailbreak DefenseJailBreakV
Attack Success Rate (ASR)6.55
14
Transfer AttackJailBreakV-28K
ASR (with SN)78.9
11
Jailbreak AttackJailbreakV
ASR0
10
Safety EvaluationJailbreakV-28K MLLM
ASR0
10
Safety EvaluationJailbreakV-28K LLM
ASR1.46
10
Jailbreak DetectionJailbreakV
AUROC99.69
9
Jailbreak DefenseJailbreakV-28K
ASR (Noise, T)8.4
6
Jailbreak Attack DefenseJailbreakV-28K v1 (test)
Defense Success Rate (Noise - T)38.16
6
Jailbreak AttackJailBreakV_28K
Attack Success Rate (ASR)58.97
3
Robustness to Jailbreak AttacksJailbreakV-28K
Harmful Reasoning Ratio24.5
3
Safety EvaluationJailBreakV (test)
HPR23
3
Showing 20 of 20 rows