Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

About

Deep neural networks are vulnerable to backdoor attacks, where an adversary manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which is impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for local attacks and black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We consider test-time image purification that incapacitates local triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose Blind Defense with Masked AutoEncoder (BDMAE). BDMAE detects possible local triggers using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Finally, we fuse MAE restorations adaptively into a purified image for making prediction. Extensive experiments under different backdoor settings validate its effectiveness and generalizability.

Tao Sun, Lu Pang, Weimin Lyu, Chao Chen, Haibin Ling• 2023

Related benchmarks

Task	Dataset	Result
Backdoor Defense	CIFAR10 (test)	ASR42.8	327
Visual Question Answering	VQA v2	ASR100	42
Visual Question Answering	OKVQA	ASR99.61	42
Backdoor Defense	GTSRB 1% poison rate (test)	Clean Accuracy96.8	27
Image Captioning	Flickr8K	BadNet ASR62.5	7
Image Captioning	COCO	BadNet ASR26.95	7

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord