Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MFAS: Multimodal Fusion Architecture Search

About

We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multi-modal action recognition dataset available.

Juan-Manuel P\'erez-R\'ua, Valentin Vielzeuf, St\'ephane Pateux, Moez Baccouche, Fr\'ed\'eric Jurie• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy90.04
474
Multimodal Multilabel ClassificationMM-IMDB (test)
Macro F155.7
87
Multimodal genre classificationMM-IMDb 1.0 (test)
Macro F155.6
13
Showing 3 of 3 rows

Other info

Follow for update