WaNet -- Imperceptible Warping-based Backdoor Attack
About
With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Defense | CIFAR10 (test) | ASR0.54 | 322 | |
| Image Classification | ImageNet V2 (test) | -- | 181 | |
| Image Classification | ImageNet-A (test) | -- | 154 | |
| Image Classification | ImageNet-Sketch (test) | -- | 132 | |
| Image Classification | GTSRB | Natural Accuracy96.2 | 87 | |
| Image Classification | GTSRB | CA95.99 | 79 | |
| Image Classification | MNIST | Clean Accuracy97 | 71 | |
| Image-Text Retrieval | COCO (test) | Recall@139.24 | 37 | |
| Backdoor Re-activation Attack | CIFAR-10 (test) | Performance98.9 | 36 | |
| Backdoor Attack | CIFAR-10 (test) | Backdoor Accuracy91.22 | 30 |