Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Jailbreak Attack Evaluation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Jailbreak Attack Evaluation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
S-Eval Aattack
Original
Attack Success Rate (ASR)
92
72
3mo ago
SafeBench 100 sampled harmful queries
StructBreak
ASR
97
48
8d ago
TRIDENT CORE
TRIDENT-CORE
HPR
7
38
3mo ago
HarmBench (400 random samples)
Llama-2-7B-Chat (Original)
ASR
0
18
3mo ago
SafetyBench MCV
Qwen2.5-VL-32B
ASR (1-Clip)
79.79
16
1d ago
500 randomly sampled prompts (test)
GCG
Similarity Score
0.81
16
22d ago
JBB sampled harmful behaviors
M
PAIR Success Rate
100
12
1mo ago
StealthGraph SG-Implicit
Grok 3 Mini
ASR
91
12
3mo ago
AdvBench
Amnesia
ASR Success Rate
86.3
9
8d ago
Paired Prompts Held-out (test)
PAIR
Similarity
0.78
8
22d ago
TRIDENT-EDGE
TRIDENT-EDGE
HPR
5
7
3mo ago
Five Safety Benchmarks AdvBench, HarmBench, HarmfulQ, JBBench, StrongReject
QwQ
ASR
7.69
6
1mo ago
StealthGraph SG-Origin
Mixtral 8×7B
ASR
39.5
6
3mo ago
HarmfulQA
DeepSeek V3.1
ASR
16
6
3mo ago
Do-Not-Answer
Gemini 2.5 Flash
ASR
2.5
6
3mo ago
FigStep Average
SafeThink
Average ASR
0.053
5
3mo ago
POLARIS
POLARIS
Attack Success Count
520
2
8d ago
Curiosity
POLARIS
Successful Attack Count
5
2
8d ago
SOS
POLARIS
Successful Attack Count
878
2
8d ago
SORRY
POLARIS
Attack Success Count
28
2
8d ago
JBB
POLARIS
Attack Success Count
1
2
8d ago
AirBench
POLARIS
Attack Success Count
1,390
2
8d ago
Showing 22 of 22 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs