PATE-TabTransGAN: Differentially Private Synthetic Tabular Data Generation via Transformer-Based Student Discrimination
About
Generating high-fidelity synthetic tabular data under formal differential privacy guarantees remains an open challenge. Methods that provide strong theoretical protection typically sacrifice the modeling of inter-feature dependencies required for realistic synthesis, while architectures that excel at capturing complex column relationships offer only empirical privacy guarantees. We present PATE-TabTransGAN, a generative framework that integrates the Private Aggregation of Teacher Ensembles (PATE) mechanism with a Transformer-based student discriminator to jointly address both requirements, and employs a GNMax RDP accountant for numerically stable privacy accounting. An ensemble of Logistic Regression teachers trained on disjoint partitions supervise the student via noisy-aggregated labels, and a residual generator is optimized against this differentially private student, inheriting formal ({\epsilon}, {\delta})-DP guarantees by post-processing. PATE-TabTransGAN was compared with PATE-GAN, DP-GAN, and DP-CTGAN, considered state-of-the-art in differentially private tabular synthesis. Experiments conducted on four tabular benchmarks (Adult, Breast, Cardio, Cervical) confirmed the high quality of the proposed method: PATE-TabTransGAN attains the best or tied-best AUROC on all four datasets. On AUCPR it matches the strongest baseline on Cardio, leads on Cervical, and trails on Breast; on Adult, we demonstrate that AUCPR is highly sensitive to positive-class convention, and that the observed gap is consistent with a convention difference between evaluation pipelines rather than a synthesis deficit.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tabular Data Utility | Adult (test) | AUC0.6487 | 18 | |
| Tabular Data Synthesis | Cardio (test) | AUC67.93 | 13 | |
| Tabular Classification | Cervical | Mean AUCPR14.15 | 4 | |
| Tabular Synthetic Data Utility Evaluation | Cervical (test) | Mean AUROC0.5451 | 4 | |
| Tabular Classification | cardio | Mean AUCPR67.07 | 4 | |
| Tabular Classification | Adult | Mean AUCPR39.59 | 4 | |
| Tabular Synthetic Data Utility Evaluation | BREAST (test) | Mean AUROC74.54 | 3 | |
| Tabular Classification | Breast | Mean AUCPR59.53 | 3 |