Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

About

Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.

Weixin Chen, Dawn Song, Bo Li• 2023

Related benchmarks

TaskDatasetResultRank
Trojan attack evaluation on diffusion modelsCelebA
FID5.4
11
Image GenerationCIFAR-10 (test)
FID4.59
11
Trojan attack evaluation on diffusion modelsCIFAR-10
FID4.28
11
Secret image extractionCIFAR10 32x32
PSNR46.54
10
Secret image extractionLSUN Bedroom 256x256
PSNR24.74
10
Diffusion Backdoor AttackCelebA-HQ
FID47.06
6
Diffusion Backdoor AttackCIFAR-10
FID43.25
6
Neural SteganographyCIFAR10 32x32 resolution
FID4.64
5
Neural SteganographyLSUN Bedroom 256x256 resolution
FID14.36
5
Backdoor AttackDiffusion Models Image-to-image
ASR90
4
Showing 10 of 11 rows

Other info

Code

Follow for update