PrivSyn: Differentially Private Data Synthesis

About

In differential privacy (DP), a challenging problem is to generate synthetic datasets that efficiently capture the useful information in the private data. The synthetic dataset enables any task to be done without privacy concern and modification to existing algorithms. In this paper, we present PrivSyn, the first automatic synthetic data generation method that can handle general tabular datasets (with 100 attributes and domain size $>2^{500}$). PrivSyn is composed of a new method to automatically and privately identify correlations in the data, and a novel method to generate sample data from a dense graphic model. We extensively evaluate different methods on multiple datasets to demonstrate the performance of our method.

Zhikun Zhang, Tianhao Wang, Ninghui Li, Jean Honorio, Michael Backes, Shibo He, Jiming Chen, Yang Zhang• 2020

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	MuJoCo HalfCheetah	Normalized Return2.4	97
Offline Reinforcement Learning	Kitchen Partial	Normalized Score0.2	69
Offline Reinforcement Learning	Maze2D medium	Normalized Return31.6	38
Offline Reinforcement Learning	Maze2D umaze	Normalized Return5.7	38
Offline Reinforcement Learning	Maze2D large	Normalized Return3.4	33
Classification	Br2000 (test)	Accuracy75.92	30
Classification	Adult dataset	Accuracy76.53	30
Classification	LPD	Accuracy33.62	27
Classification	DP Scaled Datasets 2x	Accuracy54.38	21
Classification	DP Scaled Datasets 3x	Accuracy54.46	21

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord