SOTA Jailbreak evaluation benchmarks and papers with code

Benchmarks

Dataset Name	SOTA Method	Metric
Harmful Prompts Curated April 13, 2023		Bad Bot Rate0	61	4mo ago
WildJailbreak		Performance Rate98.5	22	1mo ago
Red Queen SET	Qwen 3.5 35B	Passed Count23	18	3mo ago
JailBreak R1	SInternal	Attack Success Rate (ASR)1.3	12	2mo ago
curated dataset (test)		BAD BOT Rate0	11	4mo ago
Multilingual Jailbreak Dataset (Evaluation set)		JSR2.3	10	2mo ago
StrongReject	τ_trigger ⊕ PAP	ASR-J95.5	9	2mo ago
Synthetic dataset (held-out)		Good Bot Rate100	8	4mo ago
Fortress	PolicyAlign	Jailbreak Success Rate13.6	4	1mo ago
sexual-content prompts	gpt-5-thinking	Non-Unsafe Rate99.5	4	4mo ago
abuse, disinformation, hate prompts	gpt-5-thinking	Not Unsafe Rate99.9	4	4mo ago
violence prompts	gpt-5-thinking	Non-Unsafe Rate99.9	4	4mo ago
illicit non-violent crime prompts	gpt-5-thinking	Not Unsafe Rate99.5	4	4mo ago

Showing 13 of 13 rows