Density-aware Sample-specific Attack
About
Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive principled criteria characterizing optimal sample-specific trigger construction under a Bayes-optimal model of the victim's training. Our analysis reveals that both attack success and clean-accuracy preservation are simultaneously optimized when triggered samples are steered into low-density regions of the clean data distribution, a distributional condition that controls all moments of the poisoned distribution at once rather than a handful of input-space summary statistics. We introduce a bilevel optimization framework that estimates density ratios via conditional time-score matching and optimizes a mixture-model objective to place triggered samples in these sparse regions. Extensive evaluations on MNIST, CIFAR-10, GTSRB, and TinyImageNet demonstrate that our method achieves above 99\% attack success rate before defense and retains 50--85 percentage points higher post-defense ASR than the strongest baselines under fine-tuning defenses. Against neuron-pruning defenses, the method exhibits complete immunity, with zero neurons identified for removal across all pruning thresholds. These results expose a fundamental gap in current defense paradigms and underscore the need for defenses that operate beyond the support of the clean distribution.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Backdoor Defense | Tiny-ImageNet | Accuracy86.6 | 196 | |
| Backdoor Attack | CIFAR10 | Attack Success Rate100 | 158 | |
| Backdoor Attack | GTSRB | Attack Success Rate99.48 | 142 | |
| Image Classification | GTSRB | CA95.1 | 121 | |
| Backdoor Attack | MNIST (test) | Classification Accuracy (C-Acc)98.9 | 88 | |
| Image Classification | MNIST | Standard Accuracy98.9 | 54 | |
| Image Classification | TinyImageNet | C-Acc86.6 | 42 | |
| Image Classification | CIFAR-10 | C-Acc91.3 | 42 |