Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees

About

In the social sciences, small- to medium-scale datasets are common, and linear regression is canonical. In privacy-aware settings, much work has focused on differentially private (DP) linear regression, but mostly on point estimation with limited attention to uncertainty quantification. Meanwhile, synthetic data generation (SDG) is increasingly important for reproducibility studies, yet current DP linear regression methods do not readily support it. Mainstream DP-SDG approaches either are tailored to discrete or discretized data, making them less suitable for analyses involving continuous variables, or rely on deep learning models that require large datasets, limiting their use for the smaller-scale data typical in social science. We propose a method for linear regression with valid inference under Gaussian DP. It includes a bias-corrected estimator with asymptotic confidence intervals (CIs) and a general SDG procedure such that the corresponding regression on the synthetic data matches our DP linear regression procedure. Our approach is effective in small- to moderate-dimensional settings. Experiments show that our method (1) improves accuracy over existing methods for DP linear regression, (2) provides valid CIs, and (3) produces more reliable synthetic data for downstream statistical and machine learning tasks than current DP synthesizers.

Shurong Lin, Aleksandra Slavkovi\'c, Deekshith Reddy Bhoomireddy• 2025

Related benchmarks

TaskDatasetResultRank
RegressionD3
Average Relative MSE0.035
11
RegressionD5
Average Relative MSE0.016
11
RegressionD1
Average Relative MSE0.095
10
RegressionD2
Average Relative MSE0.151
10
RegressionD4
Average Relative MSE0.731
7
RegressionD7
Average Relative MSE0.584
7
RegressionD6
Average Relative MSE1.195
7
RegressionD8
Average Relative MSE1.49
7
Synthetic Data GenerationAuction Verification
Average Runtime (seconds)0.0282
6
Synthetic Data GenerationAbalone Age
Average Runtime (seconds)0.0662
6
Showing 10 of 23 rows

Other info

Follow for update