Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Flow Matching for Tabular Data Synthesis

About

Synthetic data generation is an important tool for privacy-preserving data sharing. Although diffusion models have set recent benchmarks, flow matching (FM) offers a promising alternative. This paper presents different ways to implement FM for tabular data synthesis. We provide a comprehensive empirical study that compares flow matching (FM and variational FM) with a state-of-the-art diffusion method (TabDDPM and TabSyn) in tabular data synthesis. We evaluate both the standard Optimal Transport (OT) and the Variance Preserving (VP) probability paths, and also compare deterministic and stochastic samplers -- something possible when learning to generate using \textit{variational} FM -- characterising the empirical relationship between data utility and privacy risk. Our key findings reveal that FM, particularly TabbyFlow, outperforms diffusion baselines. Flow matching methods also achieve better performance with remarkably low function evaluations ($\leq$ 100 steps), offering a substantial computational advantage. The choice of probability path is also crucial, as using the OT is a strong default and more robust to early stopping on average, while VP has potential to produce synthetic data with lower privacy risk. Lastly, our results show that making flows stochastic not only preserves marginal distributions but, in some instances, enables the generation of high utility synthetic data with reduced disclosure risk. The implementation code associated with this paper is publicly available at https://github.com/rulnasution/tabular-flow-matching.

Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth• 2025

Related benchmarks

TaskDatasetResultRank
Tabular Data SynthesisIndonesia Census ID (test)
Utility0.9191
6
Tabular Data Synthesisadult (AD) (test)
Utility0.772
6
Tabular Data SynthesisChurn (CH) (test)
Utility0.8784
6
Tabular Synthetic Data GenerationUK Census
Utility83.33
6
Tabular Synthetic Data GenerationCA Census
Utility0.7718
6
Tabular Synthetic Data GenerationFI Census
Utility74.51
6
Tabular Synthetic Data GenerationRW Census
Utility73.58
6
Showing 7 of 7 rows

Other info

Follow for update