Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

About

Deep neural networks are vulnerable to backdoor attacks, where an adversary manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which is impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for local attacks and black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We consider test-time image purification that incapacitates local triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose Blind Defense with Masked AutoEncoder (BDMAE). BDMAE detects possible local triggers using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Finally, we fuse MAE restorations adaptively into a purified image for making prediction. Extensive experiments under different backdoor settings validate its effectiveness and generalizability.

Tao Sun, Lu Pang, Weimin Lyu, Chao Chen, Haibin Ling• 2023

Related benchmarks

TaskDatasetResultRank
Backdoor DefenseCIFAR10 (test)
ASR42.8
322
Backdoor DefenseGTSRB 1% poison rate (test)
Clean Accuracy96.8
27
Showing 2 of 2 rows

Other info

Follow for update