Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack
About
The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in \{1,2,\infty\}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields quickly high quality results, minimizes the size of the perturbation (so that it returns the robust accuracy at every threshold with a single run). It performs better or similar to state-of-the-art attacks which are partially specialized to one $l_p$-norm, and is robust to the phenomenon of gradient masking.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | MNIST (test) | Median ||δ||p0.138 | 21 | |
| Polyp Detection | Kvasir 1.0 (test) | Precision95.5 | 12 | |
| Polyp Detection | In-house 1.0 (test) | Precision90.1 | 12 | |
| Adversarial Attack | MNIST | Avg Latency (ms)8.88 | 6 | |
| Adversarial Attack | CIFAR10 (test) | Median ||δ||p4.79 | 5 | |
| Adversarial Attack | CIFAR10 | Avg Query Time (ms)108.9 | 3 |