Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmful Prompts

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak evaluationHarmful Prompts Curated April 13, 2023
Bad Bot Rate0
61
Safety EvaluationHarmful Prompts
Harmful Score8.3
40
Safety EvaluationHarmful Prompts
ASR (Raw)2
15
Stealthiness EvaluationHarmful prompts (evaluated on 3 LLMs and 4 guard LLMs)
Mean Perplexity3.23
10
Safety EvaluationHarmful Prompts Text-only baseline
Text ASR0
10
Jailbreak Attack100 Harmful Prompts
ASR (K=2)55
9
Adversarial AttackHarmful Prompts
ASR70.3
8
Jailbreak AttackHarmful Prompts Victim: GPT-4o
Runtime (H:MM)4
5
Jailbreak AttackHarmful Prompts SDXL
Attack Success Rate (ASR)88
4
Showing 9 of 9 rows