Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework

About

The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.

Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency• 2023

Related benchmarks

TaskDatasetResultRank
Model Selection5 Synthetic Datasets (unseen)
Performance0.9991
1
Model SelectionMIMIC (unseen)
Performance99.78
1
Model SelectionENRICO (unseen)
Performance1
1
Model SelectionUR-FUNNY (unseen)
Performance98.58
1
Model SelectionMOSEI (unseen)
Performance99.35
1
Model SelectionMUSTARD (unseen)
Performance95.15
1
Model SelectionMAPS (unseen)
Performance100
1
Showing 7 of 7 rows

Other info

Code

Follow for update