Neural Trojans
About
While neural networks demonstrate stronger capabilities in pattern recognition nowadays, they are also becoming larger and deeper. As a result, the effort needed to train a network also increases dramatically. In many cases, it is more practical to use a neural network intellectual property (IP) that an IP vendor has already trained. As we do not know about the training process, there can be security threats in the neural IP: the IP vendor (attacker) may embed hidden malicious functionality, i.e. neural Trojans, into the neural IP. We show that this is an effective attack and provide three mitigation techniques: input anomaly detection, re-training, and input preprocessing. All the techniques are proven effective. The input anomaly detection approach is able to detect 99.8% of Trojan triggers although with 12.2% false positive. The re-training approach is able to prevent 94.1% of Trojan triggers from triggering the Trojan although it requires that the neural IP be reconfigurable. In the input preprocessing approach, 90.2% of Trojan triggers are rendered ineffective and no assumption about the neural IP is needed.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Defense | CIFAR10 (test) | ASR4.14 | 322 | |
| Backdoor Defense | Tiny ImageNet (test) | Accuracy59.98 | 47 | |
| Backdoor Defense | GTSRB DynamicAtt attack | Accuracy97.1 | 8 | |
| Backdoor Defense | GTSRB Badnet attack | Accuracy95.01 | 8 | |
| Backdoor Defense | GTSRB WaNet attack | Accuracy96.7 | 8 | |
| Image Classification | CIFAR-10 all-to-all setting, WaNet attack (test) | Accuracy93.37 | 8 | |
| Backdoor Defense | GTSRB Blend attack | Accuracy90.68 | 8 | |
| Backdoor Defense | GTSRB SIG attack | Accuracy91.63 | 8 | |
| Image Classification | CIFAR-10 all-to-all setting DynamicAtt attack (test) | Accuracy92.05 | 8 | |
| Image Classification | CIFAR-10 all-to-all setting, Badnet attack (test) | Accuracy85.54 | 8 |