TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data
About
Synthetic data generation for tabular datasets must balance fidelity, efficiency, and versatility to meet the demands of real-world applications. We introduce the Tabular Auto-Regressive Generative Network (TabularARGN), a flexible framework designed to handle mixed-type, multivariate, and sequential datasets. By training on all possible conditional probabilities, TabularARGN supports advanced features such as fairness-aware generation, imputation, and conditional generation on any subset of columns. The framework achieves state-of-the-art synthetic data quality while significantly reducing training and inference times, making it ideal for large-scale datasets with diverse structures. Evaluated across established benchmarks, including realistic datasets with complex relationships, TabularARGN demonstrates its capability to synthesize high-quality data efficiently. By unifying flexibility and performance, this framework paves the way for practical synthetic data generation across industries.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Tabular Data Synthesis | Adult | Shape Similarity0.985 | 17 | |
| Tabular Data Synthesis | Diabetes | Shapes0.989 | 15 | |
| Privacy Evaluation | Adult | -- | 10 | |
| Privacy Evaluation | Diabetes | -- | 9 | |
| Synthetic Data Detection | Adult | Overall Score0.733 | 7 | |
| Synthetic Data Utility | Adult | Overall Score97.1 | 7 | |
| Synthetic Data Detection | Diabetes | Overall Score78.9 | 6 | |
| Synthetic Data Utility | Diabetes | Overall Score98 | 6 | |
| Privacy Evaluation | Electric Vehicles | Overall Score0.998 | 4 | |
| Synthetic Data Utility | Electric Vehicles | Overall Score97.8 | 4 |