MMA-Diffusion: MultiModal Attack on Diffusion Models
About
In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.
Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu• 2023
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Nudity Erasure | Nudity Erasure | ASR67 | 48 | |
| Jailbreaking | MHSC | ASR-426.5 | 44 | |
| Jailbreaking | Q16 | ASR-429.5 | 44 | |
| Jailbreaking | Unsafe Prompts | Bypass Success Rate (Text)58 | 22 | |
| Textual Modal Attack | LAION-COCO subset, UnsafeDiff, and I2P NSFW prompts (test) | Q16 ASR (Step 4)84.9 | 15 | |
| Radiology Report Generation | IU-Xray | ROUGE-L Score32.95 | 12 | |
| Jailbreak Attack | VBCDE | ASR8 | 12 | |
| Jailbreak Attack | UnsafeDiff | Attack Success Rate (ASR)7.3 | 12 | |
| Text-to-Image Adversarial Attack | I2P matching categories subset | Bypass Rate96.7 | 11 | |
| Jailbreak Attack | I2P | SC ASR (4 attempts)79.14 | 11 |
Showing 10 of 25 rows