CentralNet: a Multilayer Approach for Multimodal Fusion

About

This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches.

Valentin Vielzeuf, Alexis Lechervy, St\'ephane Pateux, Fr\'ed\'eric Jurie• 2018

Related benchmarks

Task	Dataset	Result	Rank
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy89.36		500
Multimodal Multilabel Classification	MM-IMDB (test)	Macro F156.1		94

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord