Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics
About
A backdoor attack in deep learning inserts a hidden backdoor in the model to trigger malicious behavior upon specific input patterns. Existing detection approaches assume a metric space (for either the original inputs or their latent representations) in which normal samples and malicious samples are separable. We show that this assumption has a severe limitation by introducing a novel SSDT (Source-Specific and Dynamic-Triggers) backdoor, which obscures the difference between normal samples and malicious samples. To overcome this limitation, we move beyond looking for a perfect metric space that would work for different deep-learning models, and instead resort to more robust topological constructs. We propose TED (Topological Evolution Dynamics) as a model-agnostic basis for robust backdoor detection. The main idea of TED is to view a deep-learning model as a dynamical system that evolves inputs to outputs. In such a dynamical system, a benign input follows a natural evolution trajectory similar to other benign inputs. In contrast, a malicious sample displays a distinct trajectory, since it starts close to benign samples but eventually shifts towards the neighborhood of attacker-specified target samples to activate the backdoor. Extensive evaluations are conducted on vision and natural language datasets across different network architectures. The results demonstrate that TED not only achieves a high detection rate, but also significantly outperforms existing state-of-the-art detection approaches, particularly in addressing the sophisticated SSDT attack. The code to reproduce the results is made public on GitHub.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Detection | CIFAR-10 imbalanced µ=0.9, ρ=2 (test) | Badnets TPR96.2 | 13 | |
| Backdoor Sample Detection | CIFAR-10 balanced rho=1 (train test) | Badnets TPR100 | 13 | |
| Backdoor Detection | CIFAR-10 imbalanced µ=0.9, ρ=100 (test) | Badnets TPR76.9 | 13 | |
| Backdoor Sample Detection | CIFAR-10 imbalanced mu=0.9, rho=10 (train test) | Badnets TPR90.4 | 13 | |
| Backdoor Sample Detection | CIFAR-10 imbalanced mu=0.9, rho=200 (train test) | Badnets TPR46.7 | 13 |