Graphical Models for Processing Missing Data

About

This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency, estimability and testability}. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are Missing Not At Random (MNAR). In particular, we identify conditions that guarantee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally we derive testable implications for missing data models in both MAR (Missing At Random) and MNAR categories.

Karthika Mohan, Judea Pearl• 2018

Related benchmarks

Task	Dataset	Result
Biomarker-level imputation	Semi-synthetic biomarkers 80% missingness	MAE (Mean Absolute Error)2.875	7
Biomarker-level imputation	Semi-synthetic biomarkers 30% missingness	MAE2.755	7
Biomarker-level imputation	Semi-synthetic biomarkers 50% missingness	MAE2.833	7
Imputation	Semi-synthetic EHR dataset MNAR 30% (test)	MAE5	6
Imputation	Semi-synthetic EHR dataset MNAR 50% (test)	MAE5	6
Imputation	Semi-synthetic EHR dataset MNAR 80% (test)	Mean Absolute Error (MAE)5	6
Imputation	Semi-synthetic EHR dataset Pooled 30-80% MNAR (summary)	Mean Rank3.92	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord