Combating Adversaries with Anti-Adversaries
About
Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input. Our approach is training-free and theoretically supported. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models and conduct large-scale experiments from black-box to adaptive attacks on CIFAR10, CIFAR100, and ImageNet. Our layer significantly enhances model robustness while coming at no cost on clean accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Flowers102 | Clean Accuracy82.4 | 49 | |
| Image Classification | StanfordCars | Clean Accuracy76.8 | 40 | |
| Classification | PCAM | Clean Accuracy50.2 | 39 | |
| Image Classification | CIFAR10 | Clean Accuracy89.3 | 37 | |
| Classification | FGVCAircraft | Robust Accuracy10.7 | 30 | |
| Image Classification | OxfordPets | Robust Accuracy61.1 | 27 | |
| Image Classification | CIFAR100 | Clean Accuracy64.7 | 27 | |
| Image Classification | Food101 | Clean Accuracy87.7 | 25 | |
| Image Classification | Caltech-256 | Clean Accuracy88 | 20 | |
| Image Classification | General-ImageNet | Clean Accuracy82.5 | 20 |