Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Jailbreak Defense benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Jailbreak Defense
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JBB-Behaviors
SAGE
ASR
0
121
22d ago
Wild Jailbreak
Dream + DIFFUGUARD
ASR
0.1
114
4d ago
PAIR
Self-Examination
ASR
0
97
4d ago
GCG
PPL
ASR
0
91
1mo ago
DeepInception
Self-Reminder
Harmful Score
1
58
1mo ago
JBC
PreSafe
ASR
0
54
1mo ago
StrongReject
R2D
Attack Success Rate
1.5
54
4d ago
AutoDAN
SafeDecoding
ASR
0
51
1mo ago
AdvBench
Self-Reminder
ASR (Overall)
0
49
1mo ago
HarmBench and AdvBench (test)
No Defense
GCG Score
91.2
44
1mo ago
Behaviours (test)
VICUNA
ASR
0.9
44
1mo ago
ReNeLLM
IA
Harmful Score
1
42
1mo ago
AdvBench PAD
LLaDA-1.5 + Self-reminder + DIFFUGUARD
ASR
12.12
40
22d ago
Manual (IJP)
Prompt Guard
ASR
0
38
15d ago
MultiJail
SelfGrader
ASR
0
36
15d ago
LLaVA v1.5
NullSteer
ASR
3.18
36
25d ago
Qwen2-VL
Benign image
ASR
0
36
25d ago
MiniGPT-4
NullSteer
Attack Success Rate (ASR)
7.32
36
25d ago
AdvBench PAIR attack
SmoothLLM
DSR
98
35
1mo ago
ActorAttack
Token Highlighter
Attack Success Rate (ASR)
0
34
15d ago
HADES
JRS-Rem
ASR
3.6
24
1mo ago
Jailbreak Attack Benchmarks (GPTFuzz, TAP, GCG, AutoDAN, Template)
SFT
GPTFuzz ASR
24.98
24
1mo ago
DrAttack
SelfGrader
ASR
0
22
15d ago
JailbreakBench and AdvBench
Certified Semantic Smoothing
ASR
0.1
21
1mo ago
Aggregate Benchmarks
SAGE
Harmful Score
1.06
21
1mo ago
Showing 25 of 86 rows
25 / page
50 / page
100 / page
1
2
3
4
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs