Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

About

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.

Francesco Croce, Matthias Hein• 2019

Related benchmarks

Task	Dataset	Result
Adversarial Attack Transferability	CIFAR-100	Average ASR78.34	138
Adversarial Attack	CIFAR-100	ASR (Average)78.34	56
Adversarial Attack	CIFAR-10	--	32
Adversarial Attack	MNIST (test)	Median \|\|δ\|\|p0.138	21
Polyp Detection	Kvasir 1.0 (test)	Precision95.5	12
Polyp Detection	In-house 1.0 (test)	Precision90.1	12
Adversarial Attack	ImageNet-1K	Time (s)2.22	8
Adversarial Attack	MNIST	Avg Latency (ms)8.88	6
Adversarial Attack	CIFAR10 (test)	Median \|\|δ\|\|p4.79	5
Adversarial Attack	CIFAR10	Avg Query Time (ms)108.9	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord