DifAttack++: Query-Efficient Black-Box Adversarial Attack via Hierarchical Disentangled Feature Space in Cross-Domain

About

This work investigates efficient score-based black-box adversarial attacks that achieve a high Attack Success Rate (ASR) and good generalization ability. We propose a novel attack framework, termed DifAttack++, which operates in a hierarchical disentangled feature space and significantly differs from existing methods that manipulate the entire feature space. Specifically, DifAttack++ firstly disentangles an image's latent representation into an Adversarial Feature (AF) and a Visual Feature (VF) using an autoencoder equipped with a carefully designed Hierarchical Decouple-Fusion (HDF) module. In this formulation, the AF primarily governs the adversarial capability of an image, while the VF largely preserves its visual appearance. To enable the feature disentanglement and image reconstruction, we jointly train two autoencoders for the clean and adversarial image domains, i.e., cross-domain, respectively, using paired clean images and their corresponding Adversarial Examples (AEs) generated by white-box attacks on available surrogate models. During the black-box attack stage, DifAttack++ iteratively optimizes the AF based on query feedback from the victim model, while keeping the VF fixed, until a successful AE is obtained. Extensive experimental results demonstrate that DifAttack++ achieves superior ASR and query efficiency compared to state-of-the-art methods, while producing AEs with comparable visual quality. Our code is available at https://github.com/csjunjun/DifAttackPlus.git.

Jun Liu, Jiantao Zhou, Jiandian Zeng, Jinyu Tian, Isao Echizen• 2024

Related benchmarks

Task	Dataset	Result
Targeted Score-based Black-box Attack	ImageNet	ASR100	96
Untargeted Score-based Black-box Attack	ImageNet	ASR100	96
Untargeted Adversarial Attack	ImageNet (test)	--	26
Targeted Score-based Black-box Attack	Food101	ASR90	6
Targeted Score-based Black-box Attack	ObjectNet	ASR57.5	6
Untargeted Score-based Black-box Attack	ObjectNet	ASR100	6
Untargeted Score-based Black-box Attack	Food101	ASR100	6
Targeted Black-box Attack	Imagga API	Attack Success Rate (ASR)72.7	5
Untargeted Black-box Attack	Imagga API	ASR86.7	5

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord