Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Label-Consistent Backdoor Attacks

About

Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are---often blatantly---mislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain label-consistency---the condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

Alexander Turner, Dimitris Tsipras, Aleksander Madry• 2019

Related benchmarks

TaskDatasetResultRank
Backdoor AttackFaceForensics++
BA98.28
35
Backdoor AttackImageNet 10-S
CAD1.17
13
Backdoor AttackCelebA-S
CAD2.48
13
Backdoor AttackPets
CAD-1.88
13
Backdoor AttackCIFAR-10-S
Clean Attack Drop (CAD)-1.01
13
Backdoor AttackCaltech-101
CAD-154
13
Backdoor AttackCars
CAD0.23
13
Backdoor Image ClassificationCIFAR-10 (test)
BA (DCB)94.57
12
Image GenerationCIFAR-10 (synthetic)
FID9.11
12
Backdoor AttackBackdoor Attack Evaluation Summary
Poison Rate0.4
10
Showing 10 of 11 rows

Other info

Follow for update