Poisoning and Backdooring Contrastive Learning

About

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

Nicholas Carlini, Andreas Terzis• 2021

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet100-B (test)	ASR96	20
Image-Text Reranking	COCO (reranking)	Delta H@10.00e+0	12
Clean-generator Candidate Selection	COCO via DIFE	Delta Selection@10.136	6
Image-Text Retrieval	COCO Retrieval	Delta H@10.00e+0	6
Proxy Candidate Selection	COCO via DIFE	Delta Sel@10.0308	6
Retrieval	COCO	ΔH@10.00e+0	6
Backdoor Exposure Evaluation	CIFAR-10 (test)	Visual Classification Accuracy100	5

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord