Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IVGAE: Handling Incomplete Heterogeneous Data with a Variational Graph Autoencoder

About

Handling missing data remains a fundamental challenge in real-world tabular datasets, especially when data are heterogeneous with both numerical and categorical features. Existing imputation methods often fail to capture complex structural dependencies and handle heterogeneous data effectively. We present \textbf{IVGAE}, a Variational Graph Autoencoder framework for robust imputation of incomplete heterogeneous data. IVGAE constructs a bipartite graph to represent sample-feature relationships and applies graph representation learning to model structural dependencies. A key innovation is its \textit{dual-decoder architecture}, where one decoder reconstructs feature embeddings and the other models missingness patterns, providing structural priors aware of missing mechanisms. To better encode categorical variables, we introduce a Transformer-based heterogeneous embedding module that avoids high-dimensional one-hot encoding. Extensive experiments on 16 real-world datasets show that IVGAE achieves consistent improvements in RMSE and downstream F1 across MCAR, MAR, and MNAR missing scenarios under 30\% missing rates. Code and data are available at: https://github.com/echoid/IVGAE.

Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal%• 2025

Related benchmarks

TaskDatasetResultRank
ClassificationAdult 30% MCAR
F1 Score25
12
ClassificationBreast 30% MCAR
F1 Score46.1
12
ClassificationAdult 30% MAR
F1 Score30.3
12
ClassificationAdult 30% MNAR
F1 Score33.5
12
ClassificationWine 30% MNAR
F1 Score93.6
12
ClassificationAust. 30% MCAR
F1 Score65.8
12
ClassificationBank 30% MCAR
F1 Score78.1
12
Missing Data ImputationAdult 30% MCAR (test)
Average Error0.15
11
Missing Data ImputationAust. 30% MCAR
Average Error0.129
11
Missing Data ImputationBank 30% MCAR
Average Error0.095
11
Showing 10 of 25 rows

Other info

Follow for update