Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Jailbreak Attacks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseJailbreak Attacks ReNeLLM, DeepInception, GPTFuzzer, CodeAttack (test)
Harmful Score (ReNeLLM)1
6
Attack DetectionJailbreak Attacks 105K sample set
Detection Rate68
4
Showing 2 of 2 rows