Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Handling Incomplete Heterogeneous Data using VAEs

About

Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.

Alfredo Nazabal, Pablo M. Olmos, Zoubin Ghahramani, Isabel Valera• 2018

Related benchmarks

TaskDatasetResultRank
scRNA-seq imputationhuman fetus cell atlas 25% low-biased missingness (MNAR)
RMSE0.89
14
scRNA-seq imputationHuman heart cell atlas (50% MCAR)
RMSE8.832
14
scRNA-seq imputationhuman fetus cell atlas 50% MCAR
RMSE1.719
13
ClassificationWine 30% MNAR
F1 Score89.1
12
ClassificationBank 30% MCAR
F1 Score77.3
12
ClassificationAdult 30% MCAR
F1 Score24.4
12
ClassificationBreast 30% MCAR
F1 Score44.7
12
ClassificationAdult 30% MAR
F1 Score24.5
12
ClassificationAust. 30% MCAR
F1 Score58.7
12
ClassificationAdult 30% MNAR
F1 Score20.2
12
Showing 10 of 28 rows

Other info

Follow for update