Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HeteroFedSyn: Differentially Private Tabular Data Synthesis for Heterogeneous Federated Settings

About

Traditional Differential Privacy (DP) mechanisms are typically tailored to specific analysis tasks, which limits the reusability of protected data. DP tabular data synthesis overcomes this by generating synthetic datasets that can be shared for arbitrary downstream tasks. However, existing synthesis methods predominantly assume centralized or local settings and overlook the more practical horizontal federated scenario. Naively synthesizing data locally or perturbing individual records either produces biased mixtures or introduces excessive noise, especially under heterogeneous data distributions across participants. We propose HeteroFedSyn, the first DP tabular data synthesis framework designed specifically for the horizontal federated setting. Built upon the PrivSyn paradigm of 2-way marginal-based synthesis, HeteroFedSyn introduces three key innovations for distributed marginal selection: (i) an L2-based dependency metric with random projection for noise-efficient correlation measurement, (ii) an unbiased estimator to correct multiplicative noise, and (iii) an adaptive selection strategy that dynamically updates dependency scores to avoid redundancy. Extensive experiments on range queries, Wasserstein fidelity, and machine learning tasks show that, despite the increased noise inherent to federated execution, HeteroFedSyn achieves utility comparable to centralized synthesis. Our code is open-sourced via the link.

Xiaochen Li, Fengyu Gao, Xizixiang Wei, Tianhao Wang, Cong Shen, Jing Yang• 2026

Related benchmarks

TaskDatasetResultRank
Data Distribution FidelityAdult
Fidelity Error0.034
18
Marginal Query AnsweringAdult
Query Error0.5
18
Downstream RegressionAbalone
ML Efficiency3.247
9
Downstream RegressionInsurance
ML Efficiency151.3
6
Showing 4 of 4 rows

Other info

Follow for update