Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Jailbreak benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Jailbreak
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JBB-Behaviors utilitarian dilemmas (test)
TRIAL
Jailbreak Success Rate
87
72
1mo ago
JBB
Base
Jailbreak Rate
0
70
3mo ago
Sorry
Abliteration
Jailbreak Rate
99.8
70
3mo ago
AdvBench
LATS
Avg Queries
2.1
63
3mo ago
HarmBench
FigStep
Toxicity Score
1.01
50
14d ago
HarmBench Standard Behaviours (200 examples)
AutoDan
ASR
0
48
3mo ago
Jailbreak scenario
LLM.int8()
ASR (Jailbreak Scenario)
95.7
42
19d ago
WildJailBreak (WJB) (test)
Jailbreak-R1
ASR@1
2
33
22d ago
JailbreakBench (original split)
TRACE (single)
ASR@1
95.15
33
22d ago
HarmBench (HB) (standard split)
TRACE (mix)
ASR@1
90.57
33
22d ago
AdvBench Ensemble configuration GPT-4o
ArtPrompt
Attack Success Rate (ASR)
0
25
3mo ago
Jailbreak
Original
MMLU
65.5
20
3mo ago
Malicious-Educator
H-CoT
Attack Success Rate (ASR)
80
18
8d ago
AdvBench (test)
ReNeLLM
ASR (GPT-3.5 Turbo)
91.35
16
16d ago
AdvBench Ensemble configuration Average
ArtPrompt
Harmfulness Score (HS)
1.43
15
3mo ago
AdvBench Ensemble configuration Llama-3-70B
SATA-MLM
Harmfulness Score (HS)
4.6
15
3mo ago
AdvBench Ensemble configuration Claude-v2
ArtPrompt
Harmfulness Score (HS)
1.08
15
3mo ago
Advbench-M Image + Text (test)
JOOD
HF (BE)
980
13
3mo ago
StrongReject
HMNS
ASR (GPT-4o)
96.1
12
1mo ago
JBB-Behaviors
HMNS
ASR (GPT-4o)
99.2
12
1mo ago
AdvBench
HMNS
ASR (GPT-4o)
99.1
12
1mo ago
AdvBench 50 most harmful requests
MIDAS
Attack Success Rate (ASR)
95.83
12
3mo ago
AdvBench, HarmBench, and StrongReject
AGR
Time per Successful Attack (s)
10.8
7
14d ago
AdvBench Gemini-2.5-Flash
AGR
ASR
71.3
7
14d ago
AdvBench target: o4-mini
AGR
ASR (o4-mini)
64
7
14d ago
Showing 25 of 34 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs