Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PATE-TabTransGAN: Differentially Private Synthetic Tabular Data Generation via Transformer-Based Student Discrimination

About

Generating high-fidelity synthetic tabular data under formal differential privacy guarantees remains an open challenge. Methods that provide strong theoretical protection typically sacrifice the modeling of inter-feature dependencies required for realistic synthesis, while architectures that excel at capturing complex column relationships offer only empirical privacy guarantees. We present PATE-TabTransGAN, a generative framework that integrates the Private Aggregation of Teacher Ensembles (PATE) mechanism with a Transformer-based student discriminator to jointly address both requirements, and employs a GNMax RDP accountant for numerically stable privacy accounting. An ensemble of Logistic Regression teachers trained on disjoint partitions supervise the student via noisy-aggregated labels, and a residual generator is optimized against this differentially private student, inheriting formal ({\epsilon}, {\delta})-DP guarantees by post-processing. PATE-TabTransGAN was compared with PATE-GAN, DP-GAN, and DP-CTGAN, considered state-of-the-art in differentially private tabular synthesis. Experiments conducted on four tabular benchmarks (Adult, Breast, Cardio, Cervical) confirmed the high quality of the proposed method: PATE-TabTransGAN attains the best or tied-best AUROC on all four datasets. On AUCPR it matches the strongest baseline on Cardio, leads on Cervical, and trails on Breast; on Adult, we demonstrate that AUCPR is highly sensitive to positive-class convention, and that the observed gap is consistent with a convention difference between evaluation pipelines rather than a synthesis deficit.

M. Youssef, M. Wo\'zniak• 2026

Related benchmarks

TaskDatasetResultRank
Tabular Data UtilityAdult (test)
AUC0.6487
18
Tabular Data SynthesisCardio (test)
AUC67.93
13
Tabular ClassificationCervical
Mean AUCPR14.15
4
Tabular Synthetic Data Utility EvaluationCervical (test)
Mean AUROC0.5451
4
Tabular Classificationcardio
Mean AUCPR67.07
4
Tabular ClassificationAdult
Mean AUCPR39.59
4
Tabular Synthetic Data Utility EvaluationBREAST (test)
Mean AUROC74.54
3
Tabular ClassificationBreast
Mean AUCPR59.53
3
Showing 8 of 8 rows

Other info

Follow for update