Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

About

Synthesizing high-quality tabular data is an important topic in many data science tasks, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its inherent heterogeneous data types, complex inter-correlations, and intricate column-wise distributions. In this paper, we introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data, where we propose feature-wise learnable diffusion processes to counter the high disparity of different feature distributions. TabDiff is parameterized by a transformer handling different input types, and the entire framework can be efficiently optimized in an end-to-end fashion. We further introduce a mixed-type stochastic sampler to automatically correct the accumulated decoding error during sampling, and propose classifier-free guidance for conditional missing column value imputation. Comprehensive experiments on seven datasets demonstrate that TabDiff achieves superior average performance over existing competitive baselines across all eight metrics, with up to $22.5\%$ improvement over the state-of-the-art model on pair-wise column correlation estimations. Code is available at https://github.com/MinkaiXu/TabDiff.

Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, Stefano Ermon, Jure Leskovec• 2024

Related benchmarks

TaskDatasetResultRank
Tabular Data Generationmagic
DCR-00250.11
20
Tabular Data GenerationBeijing
DCR-0020.5139
20
Tabular Data GenerationNews
DCR-0020.8178
18
Tabular Data SynthesisAdult
Shape Similarity0.9932
17
Tabular Data SynthesisDiabetes
Shapes0.99
15
Minority class representationBC
Minority Class Percentage21.7
13
Utility EvaluationCR
Balanced Acc61.4
13
Utility EvaluationCC
Balanced Acc64.2
13
Minority class representationAD
Minority Class Percentage22.8
13
Minority class representationCC
Minority Class Percentage20.3
13
Showing 10 of 117 rows
...

Other info

Follow for update