Improving Black-box Adversarial Attacks with a Transfer-based Prior

About

We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.

Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu• 2019

Related benchmarks

Task	Dataset	Result
Targeted Score-based Black-box Attack	ImageNet	ASR65.3	96
Untargeted Score-based Black-box Attack	ImageNet	ASR98	96
Untargeted Adversarial Attack	ImageNet (test)	--	26
Black-box Targeted Adversarial Attack	MNIST (test)	Median Queries777	10
Untargeted Adversarial Attack	VGG-19	Fooling Rate93.5	5
Untargeted Adversarial Attack	DenseNet-121	Fooling Rate92.9	5
Untargeted Adversarial Attack	ResNext-50	Fooling Rate92.5	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord