Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

About

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, Pushmeet Kohli• 2020

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy (Clean)94.74
273
Image ClassificationCIFAR-100
Nominal Accuracy36.88
116
Adversarial RobustnessCIFAR-10 (test)--
76
Image ClassificationCIFAR-100 (test)
Clean Accuracy69.17
61
Image ClassificationCIFAR-10
AA Accuracy65.88
38
Image ClassificationCIFAR-10 512-image subset (test)
Clean Accuracy89.48
26
Image ClassificationSVHN (test)
Accuracy (Clean)93.03
17
Image ClassificationCIFAR-10 24 (test)
Standard Accuracy91.1
14
Image ClassificationCIFAR-10 (test)
AutoAttack Accuracy65.88
14
Image ClassificationCIFAR-10 l_inf threat model, epsilon=8/255 1.0 (test)
Standard Accuracy88.75
11
Showing 10 of 17 rows

Other info

Follow for update