Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Guided Diffusion Model for Adversarial Purification

About

With wider application of deep neural networks (DNNs) in various algorithms and frameworks, security threats have become one of the concerns. Adversarial attacks disturb DNN-based image classifiers, in which attackers can intentionally add imperceptible adversarial perturbations on input images to fool the classifiers. In this paper, we propose a novel purification approach, referred to as guided diffusion model for purification (GDMP), to help protect classifiers from adversarial attacks. The core of our approach is to embed purification into the diffusion denoising process of a Denoised Diffusion Probabilistic Model (DDPM), so that its diffusion process could submerge the adversarial perturbations with gradually added Gaussian noises, and both of these noises can be simultaneously removed following a guided denoising process. On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range, thereby significantly improving the correctness of classification. GDMP improves the robust accuracy by 5%, obtaining 90.1% under PGD attack on the CIFAR10 dataset. Moreover, GDMP achieves 70.94% robustness on the challenging ImageNet dataset.

Jinyi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, Hongfei Fu• 2022

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy (Clean)93.5
273
Image ClassificationCIFAR10-C (test)
Accuracy (Gaussian)81.91
52
Adversarial PurificationCIFAR-10
Standard Accuracy93.1
30
Image ClassificationCIFAR-10 512-image subset (test)
Clean Accuracy93.5
26
Adversarial PurificationCIFAR-10 (test)
Standard Accuracy93.1
24
Image ClassificationImageNet-1k 1.0 (test)
Accuracy (Clean)74.22
17
Image ClassificationImageNet 1k (test)
Clean Accuracy78.1
14
Image ClassificationCIFAR-10 24 (test)
Standard Accuracy84.85
14
Image ClassificationCIFAR-10 l2 threat model, epsilon=0.5 (test)
Standard Accuracy96.09
11
Image ClassificationCIFAR-10 l_inf threat model, epsilon=8/255 1.0 (test)
Standard Accuracy92.97
11
Showing 10 of 13 rows

Other info

Follow for update