HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
About
The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for $\ell_2$ and $\ell_\infty$ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Hallucination Evaluation | POPE | -- | 935 | |
| Black-box Attack | LSUN | ASR95.8 | 189 | |
| Black-box Attack | GenImage | ASR99.2 | 162 | |
| Adversarial Attack | ILSVRC 2012 (val) | Median L2 Distance24.181 | 112 | |
| Adversarial Attack | ILSVRC 2012 | Median L2 Distance17.75 | 96 | |
| Adversarial Attack | ImageNet-21K (val) | Median L2 Distance4.367 | 80 | |
| Adversarial Attack | Tiny ImageNet (val) | Median L2 Distance0.959 | 64 | |
| Adversarial Attack | ImageNet 21k (test) | Median L2 Distance16.244 | 64 | |
| Untargeted Attack | ImageNet (test) | Mean L2 Distortion (2K Budget)44.53 | 42 | |
| Targeted Attack | ImageNet (test) | Mean L2 Distortion (2K Budget)50.96 | 38 |