BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
About
Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-10 (test) | Accuracy (Clean)94 | 273 | |
| Image Classification | SVHN (test) | Accuracy91 | 199 | |
| Mathematical Reasoning | GSM8K | -- | 177 | |
| Code Generation | MBPP | Accuracy (%)39.2 | 146 | |
| Robot Manipulation | LIBERO (test) | Average Success Rate7.8 | 142 | |
| Image Captioning | Flickr30k (test) | CIDEr92.7 | 103 | |
| Image Classification | GTSRB | CA88.42 | 79 | |
| Topic Classification | AG's News (test) | CACC94.18 | 43 | |
| Image Classification | MNIST (test) | Accuracy97 | 38 | |
| Backdoor Attack | FaceForensics++ | BA98.79 | 35 |