Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks
About
We consider adversarial examples for image classification in the black-box decision-based setting. Here, an attacker cannot access confidence scores, but only the final label. Most attacks for this scenario are either unreliable or inefficient. Focusing on the latter, we show that a specific class of attacks, Boundary Attacks, can be reinterpreted as a biased sampling framework that gains efficiency from domain knowledge. We identify three such biases, image frequency, regional masks and surrogate gradients, and evaluate their performance against an ImageNet classifier. We show that the combination of these biases outperforms the state of the art by a wide margin. We also showcase an efficient way to attack the Google Cloud Vision API, where we craft convincing perturbations with just a few hundred queries. Finally, the methods we propose have also been found to work very well against strong defenses: Our targeted attack won second place in the NeurIPS 2018 Adversarial Vision Challenge.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | ILSVRC 2012 (val) | Median L2 Distance4.182 | 112 | |
| Adversarial Attack | ILSVRC 2012 | Median L2 Distance5.44 | 96 | |
| Adversarial Attack | ImageNet-21K (val) | Median L2 Distance1.263 | 80 | |
| Adversarial Attack | Tiny ImageNet (val) | Median L2 Distance0.23 | 64 | |
| Adversarial Attack | ImageNet 21k (test) | Median L2 Distance4.008 | 64 | |
| Untargeted Attack | ImageNet (test) | Mean L2 Distortion (2K Budget)28.44 | 42 | |
| Targeted Attack | ImageNet (test) | Mean L2 Distortion (2K Budget)35.28 | 38 | |
| Targeted Adversarial Attack | ILSVRC 2012 | Median Noise Magnitude67.728 | 7 |