Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics

About

A backdoor attack in deep learning inserts a hidden backdoor in the model to trigger malicious behavior upon specific input patterns. Existing detection approaches assume a metric space (for either the original inputs or their latent representations) in which normal samples and malicious samples are separable. We show that this assumption has a severe limitation by introducing a novel SSDT (Source-Specific and Dynamic-Triggers) backdoor, which obscures the difference between normal samples and malicious samples. To overcome this limitation, we move beyond looking for a perfect metric space that would work for different deep-learning models, and instead resort to more robust topological constructs. We propose TED (Topological Evolution Dynamics) as a model-agnostic basis for robust backdoor detection. The main idea of TED is to view a deep-learning model as a dynamical system that evolves inputs to outputs. In such a dynamical system, a benign input follows a natural evolution trajectory similar to other benign inputs. In contrast, a malicious sample displays a distinct trajectory, since it starts close to benign samples but eventually shifts towards the neighborhood of attacker-specified target samples to activate the backdoor. Extensive evaluations are conducted on vision and natural language datasets across different network architectures. The results demonstrate that TED not only achieves a high detection rate, but also significantly outperforms existing state-of-the-art detection approaches, particularly in addressing the sophisticated SSDT attack. The code to reproduce the results is made public on GitHub.

Xiaoxing Mo, Yechao Zhang, Leo Yu Zhang, Wei Luo, Nan Sun, Shengshan Hu, Shang Gao, Yang Xiang• 2023

Related benchmarks

Task	Dataset	Result
Time Series Forecasting	PeMS03	MAEC18.434	39
Backdoor Defense in Time Series Forecasting	PEMS03 v1 (full)	MAE (c)18.434	16
Backdoor Detection	Simple IHU Gemma 2B	AUROC0.972	15
Backdoor Detection	Simple IHU Llama 8B	AUROC0.83	15
Backdoor Detection	CIFAR-10 imbalanced µ=0.9, ρ=2 (test)	Badnets TPR96.2	13
Backdoor Sample Detection	CIFAR-10 balanced rho=1 (train test)	Badnets TPR100	13
Backdoor Detection	CIFAR-10 imbalanced µ=0.9, ρ=100 (test)	Badnets TPR76.9	13
Backdoor Sample Detection	CIFAR-10 imbalanced mu=0.9, rho=10 (train test)	Badnets TPR90.4	13
Backdoor Sample Detection	CIFAR-10 imbalanced mu=0.9, rho=200 (train test)	Badnets TPR46.7	13
Backdoor Detection	Complex SWE	AUROC0.761	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord