Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

malicious-prompts

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackMalicious Prompts English (test)
ASR@5100
72
Adversarial Attack16 malicious prompts
ASR0
40
Text-to-Image GenerationMalicious Prompts
FID-Censored372.38
6
Malicious Prompt Detectionahsanayub/malicious-prompts
Accuracy98.72
4
Showing 4 of 4 rows