RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

About

Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability, showing that (i) the imbalance induces a majority-class bias that increases susceptibility and (ii) conventional defenses degrade significantly as the imbalance grows. To address this, we propose Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that operates in a black-box setting using only model output probabilities. For any inspected sample, RPP determines whether the input has been backdoor-manipulated, while offering provable within-domain detectability guarantees and a probabilistic upper bound on the false positive rate. Extensive experiments on five benchmarks (MNIST, SVHN, CIFAR-10, TinyImageNet and ImageNet10) covering 10 backdoor attacks and 12 baseline defenses show that RPP achieves significantly higher detection accuracy than state-of-the-art defenses, particularly under dataset imbalance. RPP establishes a theoretical and practical foundation for defending against backdoor attacks in real-world environments with imbalanced data.

Miao Lin, Feng Yu, Rui Ning, Lusi Li, Jiawei Chen, Qian Lou, Mengxin Zheng, Chunsheng Xin, Hongyi Wu• 2026

Related benchmarks

Task	Dataset	Result
Backdoor Detection	CIFAR-10 imbalanced µ=0.9, ρ=2 (test)	Badnets TPR96.7	13
Backdoor Detection	CIFAR-10 imbalanced µ=0.9, ρ=100 (test)	Badnets TPR100	13
Backdoor Sample Detection	CIFAR-10 imbalanced mu=0.9, rho=10 (train test)	Badnets TPR96.7	13
Backdoor Sample Detection	CIFAR-10 imbalanced mu=0.9, rho=200 (train test)	Badnets TPR83.3	13
Backdoor Sample Detection	CIFAR-10 balanced rho=1 (train test)	Badnets TPR98.5	13
Backdoor Attack Detection	iNaturalist	TPR87.1	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord