Factorized Contrastive Learning: Going Beyond Multi-view Redundancy

About

In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient for downstream tasks. However, in many real-world settings, task-relevant information is also contained in modality-unique regions: information that is only present in one modality but still relevant to the task. How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks? This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy. FactorCL is built from three new contributions: (1) factorizing task-relevant information into shared and unique representations, (2) capturing task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds, and (3) multimodal data augmentations to approximate task relevance without labels. On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results on six benchmarks

Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST	Accuracy99.21	417
Multimodal Sentiment Analysis	MOSEI	--	210
Multimodal Sentiment Analysis	CMU-MOSI	--	179
Sentiment Analysis	CMU-MOSEI (test)	--	96
Multimodal Classification	MIMIC	--	72
Multimodal Action Recognition	UCF101	Accuracy81.47	23
Humor Detection	UR-FUNNY	Accuracy63.5	22
Multimodal Classification	UR-FUNNY	Accuracy63.52	21
Image-Text Classification	IRFL	Accuracy98.8	16
Multimodal Emotion Recognition	CREMA-D	Accuracy74.73	14

Showing 10 of 18 rows

Other info

Code

Follow for update

@wizwand_team Discord