Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

About

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency.

Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho• 2024

Related benchmarks

TaskDatasetResultRank
ClassificationAV-MNIST
Accuracy72.38
24
Natural Language Visual ReasoningNLVR2 (test)
Accuracy85.36
16
ICD-9 code predictionMIMIC-III v1.4 (test)
Accuracy (140-239)91.58
5
Mortality PredictionMIMIC-III v1.4 (test)
Accuracy78.1
5
Visual Question AnsweringVQA-VS (IID)
VQA Score68.63
5
Visual Question AnsweringVQA-VS (OOD)
VQA Score48.74
5
Showing 6 of 6 rows

Other info

Code

Follow for update