Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMA-Diffusion: MultiModal Attack on Diffusion Models

About

In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effectively circumventing current defensive measures in both open-source models and commercial online services. Unlike previous approaches, MMA-Diffusion leverages both textual and visual modalities to bypass safeguards like prompt filters and post-hoc safety checkers, thus exposing and highlighting the vulnerabilities in existing defense mechanisms.

Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu• 2023

Related benchmarks

TaskDatasetResultRank
Nudity ErasureNudity Erasure
ASR67
48
JailbreakingMHSC
ASR-426.5
44
JailbreakingQ16
ASR-429.5
44
JailbreakingUnsafe Prompts
Bypass Success Rate (Text)58
22
Textual Modal AttackLAION-COCO subset, UnsafeDiff, and I2P NSFW prompts (test)
Q16 ASR (Step 4)84.9
15
Radiology Report GenerationIU-Xray
ROUGE-L Score32.95
12
Jailbreak AttackVBCDE
ASR8
12
Jailbreak AttackUnsafeDiff
Attack Success Rate (ASR)7.3
12
Text-to-Image Adversarial AttackI2P matching categories subset
Bypass Rate96.7
11
Jailbreak AttackI2P
SC ASR (4 attempts)79.14
11
Showing 10 of 25 rows

Other info

Follow for update