Graphical Models for Processing Missing Data
About
This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency, estimability and testability}. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are Missing Not At Random (MNAR). In particular, we identify conditions that guarantee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally we derive testable implications for missing data models in both MAR (Missing At Random) and MNAR categories.
Karthika Mohan, Judea Pearl• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Biomarker-level imputation | Semi-synthetic biomarkers 80% missingness | MAE (Mean Absolute Error)2.875 | 7 | |
| Biomarker-level imputation | Semi-synthetic biomarkers 30% missingness | MAE2.755 | 7 | |
| Biomarker-level imputation | Semi-synthetic biomarkers 50% missingness | MAE2.833 | 7 | |
| Imputation | Semi-synthetic EHR dataset MNAR 30% (test) | MAE5 | 6 | |
| Imputation | Semi-synthetic EHR dataset MNAR 50% (test) | MAE5 | 6 | |
| Imputation | Semi-synthetic EHR dataset MNAR 80% (test) | Mean Absolute Error (MAE)5 | 6 | |
| Imputation | Semi-synthetic EHR dataset Pooled 30-80% MNAR (summary) | Mean Rank3.92 | 6 |
Showing 7 of 7 rows