Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CTAB-GAN+: Enhancing Tabular Data Synthesis

About

While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) limit its full effectiveness. Synthetic tabular data emerges as alternative to enable data sharing while fulfilling regulatory and privacy constraints. State-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN). As GANs improve the synthesized data increasingly resemble the real data risking to leak privacy. Differential privacy (DP) provides theoretical guarantees on privacy loss but degrades data utility. Striking the best trade-off remains yet a challenging research question. We propose CTAB-GAN+ a novel conditional tabular GAN. CTAB-GAN+ improves upon state-of-the-art by (i) adding downstream losses to conditional GANs for higher utility synthetic data in both classification and regression domains; (ii) using Wasserstein loss with gradient penalty for better training convergence; (iii) introducing novel encoders targeting mixed continuous-categorical variables and variables with unbalanced or skewed data; and (iv) training with DP stochastic gradient descent to impose strict privacy guarantees. We extensively evaluate CTAB-GAN+ on data similarity and analysis utility against state-of-the-art tabular GANs. The results show that CTAB-GAN+ synthesizes privacy-preserving data with at least 48.16% higher utility across multiple datasets and learning tasks under different privacy budgets.

Zilong Zhao, Aditya Kunar, Robert Birke, Lydia Y. Chen• 2022

Related benchmarks

TaskDatasetResultRank
ClassificationDiabetes (test)--
32
ClassificationThyroid
F1 Score27.46
17
ClassificationSICK
F1 Score82.35
15
Tabular Classificationdiabetes 37 (test)
Test Error73.4
15
ClassificationIncome (test)
F1 Score66.49
9
Classificationthyroid (test)
F1 Score27.46
9
ClassificationHELOC
F1 Score71.03
7
ClassificationTravel
F1 Score54.66
7
ClassificationIncome
F1 Score66.49
7
Tabular Data GenerationCH
MLE0.702
6
Showing 10 of 31 rows

Other info

Follow for update