SMIL: Multimodal Learning with Severely Missing Modality

About

A common assumption in multimodal learning is the completeness of training data, i.e., full modalities are available in all training examples. Although there exists research endeavor in developing novel methods to tackle the incompleteness of testing data, e.g., modalities are partially missing in testing examples, few of them can handle incomplete training modalities. The problem becomes even more challenging if considering the case of severely missing, e.g., 90% training examples may have incomplete modalities. For the first time in the literature, this paper formally studies multimodal learning with missing modality in terms of flexibility (missing modalities in training, testing, or both) and efficiency (most training data have incomplete modality). Technically, we propose a new method named SMIL that leverages Bayesian meta-learning in uniformly achieving both objectives. To validate our idea, we conduct a series of experiments on three popular benchmarks: MM-IMDb, CMU-MOSI, and avMNIST. The results prove the state-of-the-art performance of SMIL over existing methods and generative baselines including autoencoders and generative adversarial networks. Our code is available at https://github.com/mengmenm/SMIL.

Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng• 2021

Related benchmarks

Task	Dataset	Result
Multimodal Sentiment Analysis	CMU-MOSEI (test)	--	401
Multimodal Sentiment Analysis	CMU-MOSI (test)	--	385
Survival Prediction	TCGA-LUAD	C-index0.695	195
Survival Prediction	TCGA-UCEC	C-index0.74	179
Emotion Recognition	IEMOCAP	--	151
Multimodal Multilabel Classification	MM-IMDB (test)	--	94
Readmission prediction	MIMIC IV	AUC-ROC0.6894	74
Mortality Prediction	eICU	AUC-PRC0.4066	53
Multimodal Sentiment Analysis	MOSEI (test)	--	49
Arousal Emotion Recognition	DEAP (test)	Accuracy88.35	47

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord