Gated Multimodal Units for Information Fusion

About

This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

John Arevalo, Thamar Solorio, Manuel Montes-y-G\'omez, Fabio A. Gonz\'alez• 2017

Related benchmarks

Task	Dataset	Result
Multimodal Multilabel Classification	MM-IMDB (test)	Macro F151.4	104
Cancer Classification	BRCA	Accuracy80	22
Binary Classification	ROSMAP	Accuracy77.6	16
Binary Classification	LGG	Accuracy80.3	16
Multi-class classification	KIPAN	Accuracy97.7	16

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord