Transformed Distribution Matching for Missing Value Imputation

About

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.

He Zhao, Ke Sun, Amir Dezfouli, Edwin Bonilla• 2023

Related benchmarks

Task	Dataset	Result
Data Imputation	WINE (test)	RMSE0.132	205
Time Series Imputation	ETTh1	MAE0.749	187
Time Series Imputation	ETTm1	MSE1.003	177
Time Series Imputation	ETTm2	MSE0.998	143
Classification	33 datasets missing rate <= 10% (test)	AUC86.64	65
Time Series Imputation	Exchange	MSE0.969	54
Classification	10 Datasets Missing rate > 10% (test)	AUC80.06	50
Data Imputation	NPHA	Accuracy66.29	30
Data Imputation	Gliomas	Accuracy79.8	30
Data Imputation	Cancer	Accuracy32.67	28

Showing 10 of 35 rows

Other info

Follow for update

@wizwand_team Discord