CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
About
With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tabular Data Utility | Magic (test) | AUC0.931 | 14 | |
| Tabular Data Utility | California (test) | AUC0.981 | 14 | |
| Tabular Data Utility | Adult (test) | AUC0.829 | 14 | |
| Tabular Data Utility | Default (test) | AUC0.497 | 14 | |
| Tabular Data Utility | Shoppers (test) | AUC0.855 | 13 | |
| Tabular Data Synthesis | Aggregate of five tabular datasets (full train vs original train) | Marginal Error21.7 | 13 | |
| Tabular Data Generation | Adult (test) | MLE0.871 | 12 | |
| Tabular Data Generation | Magic (test) | MLE0.932 | 12 | |
| Tabular Data Generation | Shoppers (test) | MLE0.865 | 12 | |
| Tabular Data Generation | Beijing (test) | MLE0.818 | 12 |