Attacking Adversarial Attacks as A Defense

About

It is well known that adversarial attacks can fool deep neural networks with imperceptible perturbations. Although adversarial training significantly improves model robustness, failure cases of defense still broadly exist. In this work, we find that the adversarial attacks can also be vulnerable to small perturbations. Namely, on adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. After carefully examining state-of-the-art attacks of various kinds, we find that all these attacks have this deficiency to different extents. Enlightened by this finding, we propose to counter attacks by crafting more effective defensive perturbations. Our defensive perturbations leverage the advantage that adversarial training endows the ground-truth class with smaller local Lipschitzness. By simultaneously attacking all the classes, the misled predictions with larger Lipschitzness can be flipped into correct ones. We verify our defensive perturbation with both empirical experiments and theoretical analyses on a linear model. On CIFAR10, it boosts the state-of-the-art model from 66.16% to 72.66% against the four attacks of AutoAttack, including 71.76% to 83.30% against the Square attack. On ImageNet, the top-1 robust accuracy of FastAT is improved from 33.18% to 38.54% under the 100-step PGD attack.

Boxi Wu, Heng Pan, Li Shen, Jindong Gu, Shuai Zhao, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu• 2021

Related benchmarks

Task	Dataset	Result
Image Classification	StanfordCars	Robust Accuracy5.8	100
Image Classification	OxfordPets	Robust Accuracy13.9	71
Zero-shot Classification	CIFAR100	--	65
Zero-shot Classification	CIFAR10	Top-1 Clean Acc84.1	62
Image Classification	Flowers102	Clean Accuracy79.7	58
Classification	PCAM	Clean Accuracy50.4	39
Image Classification	Country211	Clean Accuracy21.1	38
Classification	FGVCAircraft	Robust Accuracy1.4	38
Image Classification	CIFAR10	Clean Accuracy92.1	37
Image Classification	CIFAR100	Clean Accuracy67.9	36

Showing 10 of 58 rows

Other info

Follow for update

@wizwand_team Discord