Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

About

When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's "outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers' contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.

Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin• 2026

Related benchmarks

TaskDatasetResultRank
DP Synthetic Data GenerationGerman Credit
TSTR68.2
24
Synthetic Data Generationbreast_cancer
TSTR0.963
24
Synthetic Data GenerationAdult Dataset (test)
TSTR88.2
24
Showing 3 of 3 rows

Other info

Follow for update