Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Jailbreak Defense benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Jailbreak Defense
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JBB-Behaviors
SAGE
ASR
0
121
2mo ago
AdvBench
LoRA
ASR (PAIR)
0
115
2d ago
Wild Jailbreak
Dream + DIFFUGUARD
ASR
0.1
114
1mo ago
PAIR
Self-Examination
ASR
0
97
1mo ago
HarmBench
ICD
PAIR ASR
0
91
2d ago
GCG
PPL
ASR
0
91
2mo ago
DeepInception
Self-Reminder
Harmful Score
1
58
3mo ago
AutoDAN
SafeDecoding
ASR
0
55
1mo ago
JBC
PreSafe
ASR
0
54
2mo ago
StrongReject
R2D
Attack Success Rate
1.5
54
1mo ago
HarmBench and AdvBench (test)
No Defense
GCG Score
91.2
44
3mo ago
Behaviours (test)
VICUNA
ASR
0.9
44
3mo ago
ReNeLLM
IA
Harmful Score
1
42
3mo ago
AdvBench PAD
LLaDA-1.5 + Self-reminder + DIFFUGUARD
ASR
12.12
40
2mo ago
Manual (IJP)
Prompt Guard
ASR
0
38
2mo ago
MultiJail
SelfGrader
ASR
0
36
2mo ago
LLaVA v1.5
NullSteer
ASR
3.18
36
2mo ago
Qwen2-VL
Benign image
ASR
0
36
2mo ago
MiniGPT-4
NullSteer
Attack Success Rate (ASR)
7.32
36
2mo ago
AdvBench PAIR attack
SmoothLLM
DSR
98
35
3mo ago
ActorAttack
Token Highlighter
Attack Success Rate (ASR)
0
34
2mo ago
MaliciousInstruct
SafeDecoding
ASR (GCG)
0
30
19d ago
JailbreakBench
Circuit Breakers
ASR (GCG)
0
30
19d ago
Jailbreak Attack Suite
Jailbreak Antidote
AIM Defense Rate
100
24
1mo ago
jailbreak defense dataset
Gradient-guided Token Masking
ASR
0
24
8d ago
Showing 25 of 106 rows
25 / page
50 / page
100 / page
1
2
3
4
5
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs