DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain
About
This work investigates efficient score-based black-box adversarial attacks that achieve a high Attack Success Rate (ASR) and good generalization ability. We propose a novel attack framework, termed DifAttack++, which operates in a hierarchical disentangled feature space and significantly differs from existing methods that manipulate the entire feature space. Specifically, DifAttack++ firstly disentangles an image's latent representation into an Adversarial Feature (AF) and a Visual Feature (VF) using an autoencoder equipped with a carefully designed Hierarchical Decouple-Fusion (HDF) module. In this formulation, the AF primarily governs the adversarial capability of an image, while the VF largely preserves its visual appearance. To enable the feature disentanglement and image reconstruction, we jointly train two autoencoders for the clean and adversarial image domains, i.e., cross-domain, respectively, using paired clean images and their corresponding Adversarial Examples (AEs) generated by white-box attacks on available surrogate models. During the black-box attack stage, DifAttack++ iteratively optimizes the AF based on query feedback from the victim model, while keeping the VF fixed, until a successful AE is obtained. Extensive experimental results demonstrate that DifAttack++ achieves superior ASR and query efficiency compared to state-of-the-art methods, while producing AEs with comparable visual quality. Our code is available at https://github.com/csjunjun/DifAttackPlus.git.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Targeted Score-based Black-box Attack | ImageNet | ASR100 | 96 | |
| Untargeted Score-based Black-box Attack | ImageNet | ASR100 | 96 | |
| Untargeted Adversarial Attack | ImageNet (test) | -- | 26 | |
| Targeted Score-based Black-box Attack | Food101 | ASR90 | 6 | |
| Targeted Score-based Black-box Attack | ObjectNet | ASR57.5 | 6 | |
| Untargeted Score-based Black-box Attack | ObjectNet | ASR100 | 6 | |
| Untargeted Score-based Black-box Attack | Food101 | ASR100 | 6 | |
| Targeted Black-box Attack | Imagga API | Attack Success Rate (ASR)72.7 | 5 | |
| Untargeted Black-box Attack | Imagga API | ASR86.7 | 5 |