Can Adversarial Training Be Manipulated By Non-Robust Features?

About

Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness. Our code is available at https://github.com/TLMichael/Hypocritical-Perturbation.

Lue Tao, Lei Feng, Hongxin Wei, Jinfeng Yi, Sheng-Jun Huang, Songcan Chen• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy86.9	3381
Adversarial Robustness	CIFAR-10 (test)	--	76
Adversarial Robustness	CIFAR-100 (test)	Natural Acc62.22	46
Adversarial Robustness	Tiny ImageNet (test)	--	8
Image Classification	CIFAR-10 hypocritically perturbed (test)	--	8
Robustness to Adversarial Examples	SVHN (test)	Accuracy (FGSM)59.41	7
Adversarial Robustness	SVHN (test)	Natural Accuracy96.06	3

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord