GAIN: Missing Data Imputation using Generative Adversarial Nets

About

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

Jinsung Yoon, James Jordon, Mihaela van der Schaar• 2018

Related benchmarks

Task	Dataset	Result
Classification	Musk2 downstream	Balanced Accuracy93.9	45
Missing Imputation	MIMIC-III Laboratory Data subset (n=5000, p=24) under MAR	RMSE0.061	40
Data Imputation	Gliomas	Accuracy84.13	30
Data Imputation	NPHA	Accuracy60.68	30
Data Imputation	Cancer	Accuracy42.52	28
Missing Data Imputation	eICU Collaborative Research Database Simulation of Blockwise Missing Data n=5000, p=40	RMSE0.076	24
Time Series Imputation	PEMS-BAY Block missing (test)	MAE2.18	21
Time Series Imputation	PEMS-BAY Point missing (test)	MAE1.88	21
Time Series Imputation	METR-LA Point missing (test)	MAE2.83	21
Missing Data Imputation	eICU Collaborative Research Database Simulation of Blockwise Missing Data n=5000, p=40	RMSE0.079	16

Showing 10 of 135 rows

...

Other info

Follow for update

@wizwand_team Discord