Invisible Backdoor Attack with Sample-Specific Triggers

About

Recently, backdoor attacks pose a new security threat to the training process of deep neural networks (DNNs). Attackers intend to inject hidden backdoors into DNNs, such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if hidden backdoors are activated by the attacker-defined trigger. Existing backdoor attacks usually adopt the setting that triggers are sample-agnostic, $i.e.,$ different poisoned samples contain the same trigger, resulting in that the attacks could be easily mitigated by current backdoor defenses. In this work, we explore a novel attack paradigm, where backdoor triggers are sample-specific. In our attack, we only need to modify certain training samples with invisible perturbation, while not need to manipulate other training components ($e.g.$, training loss, and model structure) as required in many existing attacks. Specifically, inspired by the recent advance in DNN-based image steganography, we generate sample-specific invisible additive noises as backdoor triggers by encoding an attacker-specified string into benign images through an encoder-decoder network. The mapping from the string to the target label will be generated when DNNs are trained on the poisoned dataset. Extensive experiments on benchmark datasets verify the effectiveness of our method in attacking models with or without defenses.

Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, Siwei Lyu• 2020

Related benchmarks

Task	Dataset	Result
Backdoor Defense	CIFAR10 (test)	ASR2.72	333
Backdoor Defense	Tiny-ImageNet	Accuracy83.09	267
Image Classification	ImageNet V2 (test)	--	232
Image Classification	ImageNet-A (test)	--	177
Backdoor Attack	CIFAR10	Attack Success Rate98.11	158
Image Classification	ImageNet-Sketch (test)	--	153
Backdoor Attack	GTSRB	Attack Success Rate96.37	142
Image Classification	GTSRB	CA92.3	135
Image Classification	MNIST	Standard Accuracy99.1	94
Backdoor Attack	MNIST (test)	Classification Accuracy (C-Acc)99.14	88

Showing 10 of 44 rows

Other info

Follow for update

@wizwand_team Discord