Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MMA-Diffusion: MultiModal Attack on Diffusion Models

About

In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.

Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu• 2023

Related benchmarks

TaskDatasetResultRank
Textual Modal AttackLAION-COCO subset, UnsafeDiff, and I2P NSFW prompts (test)
Q16 ASR (Step 4)84.9
15
Adversarial AttackDALL·E 3 commercial (test)
BR0.33
7
Adversarial NSFW Image GenerationMHSC (test)
ASR-2549.57
5
Adversarial NSFW Image GenerationSC (test)
ASR-2570
5
Adversarial NSFW Image GenerationAverage (Q16, MHSC, SC) calculated (test)
ASR-2559.66
5
Adversarial NSFW Image GenerationQ16 (test)
ASR-2559.4
5
Black-box NSFW Filter AttackUnsafeDiff (test)
Adult Bypass Rate22
2
Safety Filter BypassMMA-Diffusion
NSFW-TC6.2
1
Showing 8 of 8 rows

Other info

Follow for update