Deeply-Learned Generalized Linear Models with Missing Data

About

Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, \textit{dlglm}, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data. Supplementary materials for this article are available online.

David K Lim, Naim U Rashid, Junier B Oliva, Joseph G Ibrahim• 2022

Related benchmarks

Task	Dataset	Result
Classification	pima	AUC0.748	26
Classification	banknote	AUC85.3	18
Classification	Rice	AUC0.973	18
Classification	Breastcancer	AUC97.9	18
Execution time measurement	Breast Cancer (50% MNAR)	Training Time (s)34.266	15
Execution time measurement	Pima 50% MNAR	Training Time (s)26.196	15
Execution time measurement	BankNote 50% MNAR	Training Time52.333	15
Execution time measurement	Rice 50% MNAR	Training Time (s)122.7	15
Binary Classification	Synthetic 50% MCAR (test)	AUC73.8	7
Classification	Synthetic Dataset 60% MNAR (test)	AUC76.8	7

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord