MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

About

We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on a static binarisation of MNIST that contains 50% of missing pixels. Leveraging multiple imputation, a convolutional network trained on these incomplete digits has a test performance similar to one trained on complete data. On various continuous and binary data sets, we also show that MIWAE provides accurate single imputations, and is highly competitive with state-of-the-art methods.

Pierre-Alexandre Mattei, Jes Frellsen• 2018

Related benchmarks

Task	Dataset	Result
Data Imputation	WINE (test)	RMSE0.1078	205
Classification	YaleB (test)	Accuracy100	48
Synthetic Tabular Data Generation	90 MCAR Scenarios (6 datasets x 5 missing ratios)	Alpha-Precision6.9	21
Tabular Data Generation	MAR Benchmark 90 Scenarios: 6 datasets × 5 missing ratios (aggregated results)	Alpha Precision7.2	21
Data Imputation	California (test)	RMSE0.141	16
Data Imputation	BREAST (test)	RMSE0.0916	16
Data Imputation	blood (test)	RMSE0.1349	16
Categorical Data Imputation	Car (test)	PFC67.49	16
Data Imputation	Spam (test)	RMSE0.0561	16
Data Imputation	Yeast (test)	RMSE0.1298	16

Showing 10 of 86 rows

...

Other info

Follow for update

@wizwand_team Discord