PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics
About
Train-time data poisoning attacks threaten machine learning models by introducing adversarial examples during training, leading to misclassification. Current defense methods often reduce generalization performance, are attack-specific, and impose significant training overhead. To address this, we introduce a set of universal data purification methods using a stochastic transform, $\Psi(x)$, realized via iterative Langevin dynamics of Energy-Based Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These approaches purify poisoned data with minimal impact on classifier generalization. Our specially trained EBMs and DDPMs provide state-of-the-art defense against various attacks (including Narcissus, Bullseye Polytope, Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing attack or classifier-specific information. We discuss performance trade-offs and show that our methods remain highly effective even with poisoned or distributionally shifted generative model training data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Data Poisoning Defense | CIFAR-10 (test) | Test Accuracy93.81 | 72 | |
| Backdoor Defense | Tiny ImageNet (test) | Accuracy63.27 | 47 | |
| NTGA Data-Availability Defense | CIFAR-10 (test) | Avg Natural Accuracy85.22 | 9 | |
| Poison Defense | CINIC-10 (test) | Avg Poison Success4.76 | 6 | |
| Poison Defense | CIFAR-100 (test) | Poison Success Rate0.09 | 6 |