Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Jailbreak benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Jailbreak
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JBB-Behaviors utilitarian dilemmas (test)
TRIAL
Jailbreak Success Rate
87
72
2d ago
JBB
Base
Jailbreak Rate
0
70
1mo ago
Sorry
Abliteration
Jailbreak Rate
99.8
70
1mo ago
AdvBench
LATS
Avg Queries
2.1
63
1mo ago
HarmBench Standard Behaviours (200 examples)
AutoDan
ASR
0
48
1mo ago
AdvBench Ensemble configuration GPT-4o
ArtPrompt
Attack Success Rate (ASR)
0
25
1mo ago
Jailbreak scenario
Qwen2.5-7B
ASR (Attacked)
95.7
24
11d ago
Jailbreak
Original
MMLU
65.5
20
1mo ago
AdvBench Ensemble configuration Average
ArtPrompt
Harmfulness Score (HS)
1.43
15
1mo ago
AdvBench Ensemble configuration Llama-3-70B
SATA-MLM
Harmfulness Score (HS)
4.6
15
1mo ago
AdvBench Ensemble configuration Claude-v2
ArtPrompt
Harmfulness Score (HS)
1.08
15
1mo ago
Advbench-M Image + Text (test)
JOOD
HF (BE)
980
13
1mo ago
StrongReject
HMNS
ASR (GPT-4o)
96.1
12
4d ago
JBB-Behaviors
HMNS
ASR (GPT-4o)
99.2
12
4d ago
AdvBench
HMNS
ASR (GPT-4o)
99.1
12
4d ago
AdvBench 50 most harmful requests
MIDAS
Attack Success Rate (ASR)
95.83
12
1mo ago
GPT 4.1 8 July 2025 release
AJF
ASR
99.8
5
1mo ago
GPT-4o 29 May 2025 release
AJF
ASR
98.46
5
1mo ago
SUDO (full)
Direct Prompting
ASR (%)
0
5
1mo ago
HarmfulQ
DS-R1
ASR
18
3
4d ago
AdvBench
Original
Refusal Rate (AdvBench)
94
3
1mo ago
AdvBench Multiple-choice format (full)
Original
Safe Option Probability
99
3
1mo ago
WildJailbreak Forbidden Questions (Overall)
Amnesia
ASR
92.1
2
1mo ago
AdvBench RPO defense (full)
TAO-Attack
ASR
92
2
1mo ago
AdvBench PAT defense (full)
TAO-Attack
ASR
80
2
1mo ago
Showing 25 of 25 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs