Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Backdoor Attacks on Self-Supervised Learning

About

Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a defense method based on knowledge distillation that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .

Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed Pirsiavash• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100
Accuracy75.37
691
Image ClassificationFlowers (test)--
173
Image ClassificationSTL-10
Accuracy62.89
129
Image ClassificationGTSRB
CA76.56
79
Backdoor AttackCIFAR-10 (test)
Backdoor Accuracy82.3
54
Backdoor Attack Stealthiness EvaluationCIFAR10
SSIM0.8737
40
Image ClassificationPets (test)--
36
Image ClassificationImageNet100-B (test)
ASR1.43e+3
20
Image ClassificationCIFAR-10
ACC81.46
14
Human InspectionImageNet Backdoor images (test)
Success Fooling Rate2.4
8
Showing 10 of 26 rows

Other info

Code

Follow for update