Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

About

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed Tabsyn include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that Tabsyn outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines.

Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George Karypis• 2023

Related benchmarks

TaskDatasetResultRank
Tabular Data UtilityMagic (test)
AUC0.934
14
Tabular Data UtilityDefault (test)
AUC0.764
14
Tabular Data UtilityCalifornia (test)
AUC0.993
14
Tabular Data UtilityAdult (test)
AUC0.904
14
Tabular Data SynthesisAggregate of five tabular datasets (full train vs original train)
Marginal Error1.4
13
Tabular Data UtilityShoppers (test)
AUC0.913
13
Tabular Data GenerationMagic (test)
MLE0.938
12
Tabular Data GenerationBeijing (test)
MLE0.582
12
Tabular Data GenerationAdult (test)
MLE0.915
12
Tabular Data GenerationShoppers (test)
MLE0.92
12
Showing 10 of 20 rows

Other info

Follow for update