Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Modeling Tabular data using Conditional GAN

About

Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni• 2019

Related benchmarks

TaskDatasetResultRank
Tabular Data Synthesis Fidelitybiodeg
KS Statistic (Mean)0.49
90
Tabular Data Synthesis Fidelitysteel
KS Statistic (Mean)0.61
90
Tabular Data Synthesis Fidelityfourier
KS Fidelity0.67
88
Tabular Data Synthesis FidelityPROTEIN
Mean KS Statistic0.69
88
Tabular Data Synthesis FidelityTexture
KS Statistic (Mean)0.82
64
Cardiac risk predictionClinical cardiac rehabilitation dataset
F1 Score (Risk)65.65
60
RegressionCalifornia Housing (CH) (test)
MSE0.35
52
ClassificationCredit
ROCAUC63.7
50
Tabular Data Synthesisfourier
Chi-squared Result0.00e+0
48
Tabular Data Synthesisbiodeg
Chi-Squared Test Result0.04
47
Showing 10 of 190 rows
...

Other info

Code

Follow for update