not-MIWAE: Deep Generative Modelling with Missing not at Random Data

About

When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in cases where the missing process is dependent on the missing data. Specifically, a deep neural network enables us to flexibly model the conditional distribution of the missingness pattern given the data. This allows for incorporating prior information about the type of missingness (e.g. self-censoring) into the model. Our inference technique, based on importance-weighted variational inference, involves maximising a lower bound of the joint likelihood. Stochastic gradients of the bound are obtained by using the reparameterisation trick both in latent space and data space. We show on various kinds of data sets and missingness patterns that explicitly modelling the missing process can be invaluable.

Niels Bruun Ipsen, Pierre-Alexandre Mattei, Jes Frellsen• 2020

Related benchmarks

Task	Dataset	Result
Classification	pima	AUC0.753	26
Time Series Imputation	ETT (Original In-Sample)	MAE0.637	22
Time Series Imputation	ETT Original (Out-of-Sample)	MAE1.311	22
Classification	banknote	AUC83.9	18
Classification	Rice	AUC0.959	18
Classification	Breastcancer	AUC97.1	18
Execution time measurement	BankNote 50% MNAR	Training Time13.381	15
Execution time measurement	Pima 50% MNAR	Training Time (s)20.421	15
Execution time measurement	Rice 50% MNAR	Training Time (s)28.51	15
Execution time measurement	Breast Cancer (50% MNAR)	Training Time (s)65.371	15

Showing 10 of 64 rows

Other info

Follow for update

@wizwand_team Discord