Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

About

Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is challenging due to 1) the inherent incompleteness of the training data and 2) the difficulty in performing conditional inference from unconditional generative models. To deal with these challenges, this paper introduces DiffPuter, a tailored diffusion model combined with the Expectation-Maximization (EM) algorithm for missing data imputation. DiffPuter iteratively trains a diffusion model to learn the joint distribution of missing and observed data and performs an accurate conditional sampling to update the missing values using a tailored reversed sampling strategy. Our theoretical analysis shows that DiffPuter's training step corresponds to the maximum likelihood estimation of data density (M-step), and its sampling step represents the Expected A Posteriori estimation of missing values (E-step). Extensive experiments across ten diverse datasets and comparisons with 17 different imputation methods demonstrate DiffPuter's superior performance. Notably, DiffPuter achieves an average improvement of 6.94% in MAE and 4.78% in RMSE compared to the most competitive existing method.

Hengrui Zhang, Liancheng Fang, Qitian Wu, Philip S. Yu• 2024

Related benchmarks

TaskDatasetResultRank
Time Series ImputationETT Original (Out-of-Sample)
MAE0.782
22
Time Series ImputationETT (Original In-Sample)
MAE0.362
22
Image InpaintingCelebA-HQ (test)
LPIPS2.98
18
Time Series ImputationSTOCK (Original Out-of-Sample)
MAE0.406
11
Time Series ImputationSTOCK Original (In-Sample)
MAE0.45
11
Time Series ImputationPEMS-Bay (Original Out-of-Sample)
MAE0.182
11
Time Series ImputationPEMS-Bay (Original In-Sample)
MAE0.168
11
Out-of-sample ImputationPhysioNet (out-of-sample)
MAE0.2755
9
Tabular Imputationgesture out-of-sample
MAE0.353
6
Tabular Imputationadult out-of-sample
MAE0.504
6
Showing 10 of 23 rows

Other info

Follow for update