Harmful Prompts

Benchmarks

Task Name	Dataset Name	SOTA Result
Jailbreak evaluation	Harmful Prompts Curated April 13, 2023	Bad Bot Rate0	61
Safety Evaluation	Harmful Prompts	Harmful Score8.3	40
Safety Evaluation	Harmful Prompts	ASR (Raw)2	15
Stealthiness Evaluation	Harmful prompts (evaluated on 3 LLMs and 4 guard LLMs)	Mean Perplexity3.23	10
Safety Evaluation	Harmful Prompts Text-only baseline	Text ASR0	10
Jailbreak Attack	100 Harmful Prompts	ASR (K=2)55	9
Jailbreak Attack	Harmful Prompts model-averaged	Model Averaged ASR (0.8)4.8	8
Adversarial Attack	Harmful Prompts	ASR70.3	8
Jailbreak Attack	Harmful Prompts 8-model (test)	ASR92.4	5
Jailbreak Attack	Harmful Prompts Victim: GPT-4o	Runtime (H:MM)4	5
Jailbreak Attack	Harmful Prompts SDXL	Attack Success Rate (ASR)88	4

Showing 11 of 11 rows