Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adversarial Neuron Pruning Purifies Backdoored Deep Models

About

As deep neural networks (DNNs) are growing larger, their requirements for computational resources become huge, which makes outsourcing training more popular. Training in a third-party platform, however, may introduce potential risks that a malicious trainer will return backdoored DNNs, which behave normally on clean samples but output targeted misclassifications whenever a trigger appears at the test time. Without any knowledge of the trigger, it is difficult to distinguish or recover benign DNNs from backdoored ones. In this paper, we first identify an unexpected sensitivity of backdoored DNNs, that is, they are much easier to collapse and tend to predict the target label on clean samples when their neurons are adversarially perturbed. Based on these observations, we propose a novel model repairing method, termed Adversarial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor. Experiments show, even with only an extremely small amount of clean data (e.g., 1%), ANP effectively removes the injected backdoor without causing obvious performance degradation.

Dongxian Wu, Yisen Wang• 2021

Related benchmarks

TaskDatasetResultRank
Backdoor DefenseCIFAR10 (test)
ASR0.05
322
Backdoor DefenseGTSRB (test)
ASR0.00e+0
127
Backdoor DefenseTiny-ImageNet
Accuracy50.56
102
Image ClassificationMNIST
Clean Accuracy95
71
Backdoor DefenseCIFAR10 (train)
ASR0.31
63
Image ClassificationCINIC-10
Accuracy66
59
Backdoor DefenseTiny ImageNet (test)
Accuracy63.85
47
Backdoor DefenseCIFAR-10 (test)--
40
Backdoor DefenseCIFAR-10 Blended v1 (test)
Clean Accuracy94.51
34
Backdoor DefenseGTSRB BadNets (test)
Attack Success Rate99.06
22
Showing 10 of 51 rows

Other info

Code

Follow for update