Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Neural Trojans

About

While neural networks demonstrate stronger capabilities in pattern recognition nowadays, they are also becoming larger and deeper. As a result, the effort needed to train a network also increases dramatically. In many cases, it is more practical to use a neural network intellectual property (IP) that an IP vendor has already trained. As we do not know about the training process, there can be security threats in the neural IP: the IP vendor (attacker) may embed hidden malicious functionality, i.e. neural Trojans, into the neural IP. We show that this is an effective attack and provide three mitigation techniques: input anomaly detection, re-training, and input preprocessing. All the techniques are proven effective. The input anomaly detection approach is able to detect 99.8% of Trojan triggers although with 12.2% false positive. The re-training approach is able to prevent 94.1% of Trojan triggers from triggering the Trojan although it requires that the neural IP be reconfigurable. In the input preprocessing approach, 90.2% of Trojan triggers are rendered ineffective and no assumption about the neural IP is needed.

Yuntao Liu, Yang Xie, Ankur Srivastava• 2017

Related benchmarks

TaskDatasetResultRank
Backdoor DefenseCIFAR10 (test)
ASR4.14
322
Backdoor DefenseTiny ImageNet (test)
Accuracy59.98
47
Backdoor DefenseGTSRB DynamicAtt attack
Accuracy97.1
8
Backdoor DefenseGTSRB Badnet attack
Accuracy95.01
8
Backdoor DefenseGTSRB WaNet attack
Accuracy96.7
8
Image ClassificationCIFAR-10 all-to-all setting, WaNet attack (test)
Accuracy93.37
8
Backdoor DefenseGTSRB Blend attack
Accuracy90.68
8
Backdoor DefenseGTSRB SIG attack
Accuracy91.63
8
Image ClassificationCIFAR-10 all-to-all setting DynamicAtt attack (test)
Accuracy92.05
8
Image ClassificationCIFAR-10 all-to-all setting, Badnet attack (test)
Accuracy85.54
8
Showing 10 of 12 rows

Other info

Follow for update