Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network

About

The use of synthetic data in machine learning applications and research offers many benefits, including performance improvements through data augmentation, privacy preservation of original samples, and reliable method assessment with fully synthetic data. This work proposes a time-efficient synthetic data generation method based on a fully connected neural network and a randomized loss function that transforms a random Gaussian distribution to approximate a target real-world dataset. The experiments conducted on 25 diverse tabular real-world datasets confirm that the proposed solution surpasses the state-of-the-art generative methods and achieves reference MMD scores orders of magnitude faster than modern deep learning solutions. The experiments involved analyzing distributional similarity, assessing the impact on classification quality, and using PCA for dimensionality reduction, which further enhances data privacy and can boost classification quality while reducing time and memory complexity.

Joanna Komorniczak• 2026

Related benchmarks

TaskDatasetResultRank
Data SynthesisYeast
MMD0.157
8
Data SynthesisHypothyroid
MMD0.065
8
Data Synthesislawsuit
MMD0.176
8
Data Synthesisprofb
MMD0.058
8
Data Synthesistic_tac_toe
MMD0.097
8
Data SynthesisBiomed
MMD0.065
8
Data Synthesisionosphere
MMD0.08
8
Data Synthesisglass2
MMD0.063
8
Data Synthesisparity5+5
MMD0.13
8
Data Synthesisallrep
MMD0.16
8
Showing 10 of 49 rows

Other info

Follow for update