Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CentralNet: a Multilayer Approach for Multimodal Fusion

About

This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches.

Valentin Vielzeuf, Alexis Lechervy, St\'ephane Pateux, Fr\'ed\'eric Jurie• 2018

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy89.36
474
Multimodal Multilabel ClassificationMM-IMDB (test)
Macro F156.1
87
Showing 2 of 2 rows

Other info

Follow for update