Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks

About

Adversarial attacks based on randomized search schemes have obtained state-of-the-art results in black-box robustness evaluation recently. However, as we demonstrate in this work, their efficiency in different query budget regimes depends on manual design and heuristic tuning of the underlying proposal distributions. We study how this issue can be addressed by adapting the proposal distribution online based on the information obtained during the attack. We consider Square Attack, which is a state-of-the-art score-based black-box attack, and demonstrate how its performance can be improved by a learned controller that adjusts the parameters of the proposal distribution online during the attack. We train the controller using gradient-based end-to-end training on a CIFAR10 model with white box access. We demonstrate that plugging the learned controller into the attack consistently improves its black-box robustness estimate in different query regimes by up to 20% for a wide range of different models with black-box access. We further show that the learned adaptation principle transfers well to the other data distributions such as CIFAR100 or ImageNet and to the targeted attack setting.

Maksym Yatsura, Jan Hendrik Metzen, Matthias Hein• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet 1000 images (val)
Robust Accuracy52.5
82
Targeted Adversarial AttackImageNet 1000 images (val)
Clean Accuracy77.6
24
Untargeted Adversarial AttackImageNet 1000 images (val)
Clean Accuracy77.6
24
Adversarial RobustnessCIFAR10 1000 images (test)
Robust Accuracy66.1
24
Adversarial RobustnessCIFAR100 1000 images (val)
Clean Acc70.25
24
Image ClassificationCIFAR10 1000 images (val)
Clean Accuracy88.67
6
Robustness EvaluationCIFAR10 RobustBench
Mean Robust Accuracy Improvement4.29
4
Showing 7 of 7 rows

Other info

Code

Follow for update