Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

About

Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to $\ell_2$-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably $\ell_2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_2$-defenses. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial .

Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, Sebastien Bubeck• 2019

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST	--	398
Certified Image Classification	MNIST (test)	Certified Accuracy (r=0.00)99.39	27
Image Classification Certified Robustness	MNIST (test)	Overall ACR1.779	27
Certified Robustness	CIFAR-10 (test)	--	26
Certified Robust Classification	CIFAR-10 official (test)	ACR0.684	14
Image Classification	ImageNet sub-sampled 500 samples (val)	ACR1.04	8
Image Classification	CIFAR-10 (test)	Clean Accuracy86.2	5
Image Classification	CMNIST (test)	Certified Accuracy (r=0.00)17.7	5
Certified Image Classification	CelebA (test)	Certified Accuracy (r=0.00)25	5
Image Classification	CelebA (test)	Certified Accuracy (r=0.00)29	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord