Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MaliciousInstruct

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackMaliciousInstruct
ASR100
35
Adversarial and Jailbreaking Attack DetectionMaliciousInstruct
AUROC0.8825
20
Visual Jailbreaking AttackMaliciousInstruct
ASR92
16
Jailbreak AttackMaliciousInstruct (test)
ASR (Refusal)95
10
Jailbreak AttackMaliciousInstruct 41 (test)
ASR0.935
6
Showing 5 of 5 rows