Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Jointly Modeling Inter- & Intra-Modality Dependencies for Multi-modal Learning

About

Supervised multi-modal learning involves mapping multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency.

Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho• 2024

Related benchmarks

TaskDatasetResultRank
Natural Language Visual ReasoningNLVR2 (test)
Accuracy85.36
16
ClassificationAV-MNIST
Accuracy72.38
12
ICD-9 code predictionMIMIC-III v1.4 (test)
Accuracy (140-239)91.58
5
Mortality PredictionMIMIC-III v1.4 (test)
Accuracy78.1
5
Visual Question AnsweringVQA-VS (IID)
VQA Score68.63
5
Visual Question AnsweringVQA-VS (OOD)
VQA Score48.74
5
Showing 6 of 6 rows

Other info

Code

Follow for update