Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

harmful requests and jailbreak prompts

Benchmarks

Task NameDataset NameSOTA ResultTrend
Attack Success Rate20,000 harmful requests and 20,000 jailbreak prompts (test)
Attack Success Rate (ASR)80.2
18
Showing 1 of 1 rows