Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency

About

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.

Junfeng Guo, Yiming Li, Xun Chen, Hanqing Guo, Lichao Sun, Cong Liu• 2023

Related benchmarks

TaskDatasetResultRank
Backdoor DefenseCIFAR10 (test)
ASR37.5
322
Backdoor DetectionCIFAR-10--
120
Backdoor DetectionGTSRB
TPR100
39
Backdoor DefenseGTSRB 1% poison rate (test)
Clean Accuracy97.3
27
Backdoor Sample DetectionCIFAR-10 balanced rho=1 (train test)
Badnets TPR100
13
Backdoor DetectionCIFAR-10 imbalanced µ=0.9, ρ=100 (test)
Badnets TPR90.7
13
Backdoor Sample DetectionCIFAR-10 imbalanced mu=0.9, rho=10 (train test)
Badnets TPR95.9
13
Backdoor Sample DetectionCIFAR-10 imbalanced mu=0.9, rho=200 (train test)
Badnets TPR67.3
13
Backdoor DetectionCIFAR-10 imbalanced µ=0.9, ρ=2 (test)
Badnets TPR95.8
13
Backdoor DetectionTiny-ImageNet
TPR99.9
12
Showing 10 of 10 rows

Other info

Follow for update