Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation
About
We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Together, these enhancements improve fidelity and downstream utility without weakening privacy guarantees. On financial and medical datasets, DP-FinDiff achieves 16-42% higher utility than DP baselines at comparable privacy levels, demonstrating its promise for safe and effective data sharing in sensitive domains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Credit | ROCAUC69.7 | 50 | |
| Classification | Adult | ROCAUC0.792 | 40 | |
| Binary Classification | Diabetes | AUC0.584 | 34 | |
| Binary Classification | bank-marketing | AUC0.804 | 19 | |
| Object Classification | Payments | ROC AUC0.799 | 12 |