Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA LLM Jailbreaking benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
LLM Jailbreaking
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
JBB-Behaviors Scenario J3
EvoJail
Hypervolume
0.707
21
2mo ago
JBB-Behaviors Scenario J2
EvoJail
Hypervolume
0.691
21
2mo ago
JBB-Behaviors Scenario J1
EvoJail
Hypervolume
59.1
21
2mo ago
GPTFuzzer Scenario G3
EvoJail
Hypervolume
0.696
21
2mo ago
GPTFuzzer Scenario G2
EvoJail
Hypervolume
77
21
2mo ago
GPTFuzzer Scenario G1
EvoJail
Hypervolume
0.708
21
2mo ago
HarmBench text (test N = 320)
PEO
ASR-M
93.75
16
1mo ago
AdvBench
PEO
ASR-M
88.27
16
1mo ago
AdaSteer Evaluation Set (test)
SCAV
SRF
1
14
13d ago
100-query jailbreak set
Reward-Guided RRT
Jailbreak Success Rate
46.4
8
3mo ago
Llama3-DeRTA
Adaptive Probe-based Steering
Success Rate First (SRF)
61
6
13d ago
R2D2
Adaptive Probe-based Steering
SRF
31
6
13d ago
Llama3-CB
Adaptive Probe-based Steering
Success Rate First (SRF)
70
6
13d ago
Llama3 TAR
Adaptive Probe-based Steering
Success Rate First (SRF)
32
6
13d ago
Llama3-LAT
Adaptive Probe-based Steering
Success Rate First (SRF)
71
6
13d ago
Llama3 RB
Adaptive Probe-based Steering
Success Rate First (SRF)
71
6
13d ago
Mistral-RB
Adaptive Probe-based Steering
SRF
58
6
13d ago
Mistral-SU
Adaptive Probe-based Steering
SRF (Mistral-SU)
46
6
13d ago
Gemma-DA
RepE
SRF
1
6
13d ago
Gemma 9b-it 2
RD-C
SRF
71
6
13d ago
Mistral-7B-Instruct v0.2
Adaptive Probe-based Steering
Success Rate First (SRF)
77
6
13d ago
AdvBench GPT-4 Series
AJF
ASR
98.9
5
2mo ago
AdvBench Llama2-13b
GPTFuzz Top-5
ASR
95.4
5
2mo ago
AdvBench Llama2-7b
GPTFuzz Top-5
Attack Success Rate (ASR)
97.3
5
2mo ago
Mistral CB
Adaptive Probe-based Steering
Success Rate First (SRF)
72
4
13d ago
Showing 25 of 32 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs