VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification

About

Imbalanced classification remains a pervasive challenge in machine learning, particularly when minority samples are too scarce to provide a robust discriminative boundary. In such extreme scenarios, conventional models often suffer from unstable decision boundaries and a lack of reliable error control. To bridge the gap between generative modeling and discriminative classification, we propose a two-stage framework \textbf{VAE-Inf} that integrates deep representation learning with statistically interpretable hypothesis testing. In the first stage, we adopt a one-class modeling perspective by training a variational autoencoder (VAE) exclusively on majority-class data to capture the underlying reference distribution. The resulting latent posteriors are aggregated via a Wasserstein barycenter to construct a global Gaussian reference model, providing a geometrically principled baseline for the majority class. In the second stage, we transform this generative foundation into a discriminative classifier by fine-tuning the encoder with limited minority samples. This is achieved through a novel distribution-aware loss that enforces probabilistic separation between classes based on variance-normalized projection statistics. For inference, we introduce a projection-based score that admits a natural hypothesis testing interpretation, allowing for a distribution-free calibration procedure. This approach yields exact finite-sample control of the Type-I error (false positive rate) without relying on restrictive parametric assumptions. Extensive experiments on diverse real-world benchmarks demonstrate that our framework achieves competitive performance against other approaches. The codes are available upon request.

Hongfei Wu, Ruijian Han, Yancheng Yuan• 2026

Related benchmarks

Task	Dataset	Result
Anomaly Detection	CIFAR-10	AUC71.46	136
Anomaly Detection	Backdoor rho = 0.20% (test)	AUC-ROC99.26	6
Anomaly Detection	Census rho = 6.20% (test)	AUC-ROC0.9388	6
Anomaly Detection	Census rho = 0.21% (test)	AUC-ROC90.61	6
Imbalanced Classification	Backdoor 0.20% (test)	Type-II Error (at Type-I=0.01)5.36	6
Imbalanced Classification	Census 6.20% (test)	Type-II Error (at Type-I=0.01)63.3	6
Imbalanced Classification	Census 0.21% (test)	Type-II Error (Type-I=0.01)0.6828	6
Anomaly Detection	Credit Card rho = 0.17% (test)	AUC-ROC97.48	6
Anomaly Detection	Backdoor rho = 2.44% (test)	AUC-ROC99.31	6
Imbalanced Classification	Credit Card 0.17% (test)	Type-II Error (Type-I=0.01)10.2	6

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord