Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Missing Data Imputation by Reducing Mutual Information with Rectified Flows

About

This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and the corresponding missingness mask. Inspired by GAN-based approaches that train generators to decrease the predictability of missingness patterns, our method explicitly targets this reduction in mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missingness mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework can be achieved by solving an ODE whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating its superior imputation performance. Our implementation is available at https://github.com/yujhml/MIRI-Imputation.

Jiahao Yu, Qizhen Ying, Leyang Wang, Ziyue Jiang, Song Liu• 2025

Related benchmarks

TaskDatasetResultRank
Sample GenerationConcrete
Standardized Energy Distance20.09
8
Sample GenerationHousing
Standardized Energy Distance19.78
8
Sample GenerationStock
Standardized Energy Distance67.41
8
Sample GenerationForest
Standardized Energy Distance9.36
8
Sample Generationwindspeed
Standardized Energy Distance15.34
8
Tabular Synthetic Data GenerationParkinsons--
8
Sample GenerationSCM1d
Standardized energy distance10
7
Sample GenerationSCM20d
Standardized Energy Distance10
7
Sample Generationpumadyn32nm
Standardized Energy Distance10
7
Sample Generationallergens
Standardized Energy Distance125.1
7
Showing 10 of 10 rows

Other info

Follow for update