Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Density-aware Sample-specific Attack

About

Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive principled criteria characterizing optimal sample-specific trigger construction under a Bayes-optimal model of the victim's training. Our analysis reveals that both attack success and clean-accuracy preservation are simultaneously optimized when triggered samples are steered into low-density regions of the clean data distribution, a distributional condition that controls all moments of the poisoned distribution at once rather than a handful of input-space summary statistics. We introduce a bilevel optimization framework that estimates density ratios via conditional time-score matching and optimizes a mixture-model objective to place triggered samples in these sparse regions. Extensive evaluations on MNIST, CIFAR-10, GTSRB, and TinyImageNet demonstrate that our method achieves above 99\% attack success rate before defense and retains 50--85 percentage points higher post-defense ASR than the strongest baselines under fine-tuning defenses. Against neuron-pruning defenses, the method exhibits complete immunity, with zero neurons identified for removal across all pruning thresholds. These results expose a fundamental gap in current defense paradigms and underscore the need for defenses that operate beyond the support of the clean distribution.

Qiyuan Wang, Yao Li, Raymond K. W. Wong• 2026

Related benchmarks

TaskDatasetResultRank
Backdoor DefenseTiny-ImageNet
Accuracy86.6
196
Backdoor AttackCIFAR10
Attack Success Rate100
158
Backdoor AttackGTSRB
Attack Success Rate99.48
142
Image ClassificationGTSRB
CA95.1
121
Backdoor AttackMNIST (test)
Classification Accuracy (C-Acc)98.9
88
Image ClassificationMNIST
Standard Accuracy98.9
54
Image ClassificationTinyImageNet
C-Acc86.6
42
Image ClassificationCIFAR-10
C-Acc91.3
42
Showing 8 of 8 rows

Other info

Follow for update