Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

About

Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

Kang Liu, Brendan Dolan-Gavitt, Siddharth Garg• 2018

Related benchmarks

Task	Dataset	Result
Backdoor Defense	CIFAR10 (test)	ASR0.00e+0	327
Arithmetic Reasoning	GSM8K	--	272
Backdoor Defense	Tiny-ImageNet	Accuracy52.38	196
Question Answering	OpenBookQA	Accuracy43.8	145
Text Classification	SST-2	Accuracy96.32	133
Backdoor Defense	GTSRB (test)	ASR2.23	127
Backdoor Defense	AGNews	Attack Success Rate7.07	105
Sentiment Classification	SST-2 64 instances (test)	Accuracy92.2	80
Backdoor Defense	Average of four datasets (test)	Accuracy87.5	76
Image Classification	MNIST	Clean Accuracy97	71

Showing 10 of 80 rows

...

Other info

Follow for update

@wizwand_team Discord