SOTA LLM Jailbreaking benchmarks and papers with code

Benchmarks

Dataset Name	SOTA Method	Metric
JBB-Behaviors Scenario J3	EvoJail	Hypervolume0.707	21	4mo ago
JBB-Behaviors Scenario J2	EvoJail	Hypervolume0.691	21	4mo ago
JBB-Behaviors Scenario J1	EvoJail	Hypervolume59.1	21	4mo ago
GPTFuzzer Scenario G3	EvoJail	Hypervolume0.696	21	4mo ago
GPTFuzzer Scenario G2	EvoJail	Hypervolume77	21	4mo ago
GPTFuzzer Scenario G1	EvoJail	Hypervolume0.708	21	4mo ago
HarmBench text (test N = 320)	PEO	ASR-M93.75	16	2mo ago
AdvBench	PEO	ASR-M88.27	16	2mo ago
AdaSteer Evaluation Set (test)	SCAV	SRF1	14	2mo ago
100-query jailbreak set		Jailbreak Success Rate46.4	8	4mo ago
Llama3-DeRTA	Adaptive Probe-based Steering	Success Rate First (SRF)61	6	2mo ago
R2D2	Adaptive Probe-based Steering	SRF31	6	2mo ago
Llama3-CB	Adaptive Probe-based Steering	Success Rate First (SRF)70	6	2mo ago
Llama3 TAR	Adaptive Probe-based Steering	Success Rate First (SRF)32	6	2mo ago
Llama3-LAT	Adaptive Probe-based Steering	Success Rate First (SRF)71	6	2mo ago
Llama3 RB	Adaptive Probe-based Steering	Success Rate First (SRF)71	6	2mo ago
Mistral-RB	Adaptive Probe-based Steering	SRF58	6	2mo ago
Mistral-SU	Adaptive Probe-based Steering	SRF (Mistral-SU)46	6	2mo ago
Gemma-DA	RepE	SRF1	6	2mo ago
Gemma 9b-it 2	RD-C	SRF71	6	2mo ago
Mistral-7B-Instruct v0.2	Adaptive Probe-based Steering	Success Rate First (SRF)77	6	2mo ago
AdvBench GPT-4 Series	AJF	ASR98.9	5	4mo ago
AdvBench Llama2-13b	GPTFuzz Top-5	ASR95.4	5	4mo ago
AdvBench Llama2-7b	GPTFuzz Top-5	Attack Success Rate (ASR)97.3	5	4mo ago
Mistral CB	Adaptive Probe-based Steering	Success Rate First (SRF)72	4	2mo ago

Showing 25 of 32 rows