WaNet -- Imperceptible Warping-based Backdoor Attack
About
With the thriving of deep learning and the widespread practice of using pre-trained networks, backdoor attacks have become an increasing security threat drawing many research interests in recent years. A third-party model can be poisoned in training to work well in normal conditions but behave maliciously when a trigger pattern appears. However, the existing backdoor attacks are all built on noise perturbation triggers, making them noticeable to humans. In this paper, we instead propose using warping-based triggers. The proposed backdoor outperforms the previous methods in a human inspection test by a wide margin, proving its stealthiness. To make such models undetectable by machine defenders, we propose a novel training mode, called the ``noise mode. The trained networks successfully attack and bypass the state-of-the-art defense methods on standard classification datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Behavior analyses show that our backdoors are transparent to network inspection, further proving this novel attack mechanism's efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Defense | CIFAR10 (test) | ASR0.54 | 322 | |
| Image Classification | ImageNet V2 (test) | -- | 216 | |
| Image Classification | ImageNet-A (test) | -- | 175 | |
| Image Classification | ImageNet-Sketch (test) | -- | 153 | |
| Image Classification | GTSRB | Natural Accuracy96.2 | 87 | |
| Image Classification | GTSRB | CA95.99 | 79 | |
| Image Classification | MNIST | Clean Accuracy97 | 71 | |
| Backdoor Attack | CIFAR10 | Attack Success Rate12.53 | 70 | |
| Backdoor Attack | GTSRB | Backdoor Accuracy98.28 | 59 | |
| Backdoor Attack | CIFAR-10 (test) | Backdoor Accuracy91.22 | 54 |