Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

About

Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models, which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for GAN-based methods on large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable in generating high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective in generating unrestricted adversarial examples, which outperforms state-of-the-art unrestricted adversarial attack methods in terms of attack performance and generation quality.

Xuelong Dai, Kaisheng Liang, Bin Xiao• 2023

Related benchmarks

TaskDatasetResultRank
Image GenerationImageNet
FID13.5
68
Image GenerationCIFAR100
FID44.1
51
Risky Sample TransferabilityImageNet
Error Rate6.1
30
Image ClassificationNICO++ OOD
Accuracy74.6
24
Image ClassificationPACS OOD (test)
Accuracy78.3
24
Image ClassificationPACS ID (train)
Accuracy95.8
24
Image ClassificationNICO++ (ID)
Accuracy84.7
24
Risky Sample TransferabilityCIFAR-100
Error Rate41.1
18
Risky sample generationCIFAR-100
Error Rate44.1
12
Risky sample generationPACS
Error Rate14.5
12
Showing 10 of 16 rows

Other info

Follow for update