Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-attention Recurrent Network for Human Communication Comprehension

About

Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape human communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art performance on all the datasets.

Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency• 2018

Related benchmarks

TaskDatasetResultRank
Emotion Recognition in ConversationIEMOCAP (test)--
154
Multimodal Sentiment AnalysisCMU-MOSI
MAE0.968
59
Emotion ClassificationIEMOCAP (test)--
36
Sentiment AnalysisCMU-MOSI
Accuracy (2-class)77.1
21
Binary Sentiment ClassificationCMU-MOSI (test)
A2 Score77.1
17
Multiclass Sentiment ClassificationCMU-MOSI (test)
A734.7
16
Sentiment AnalysisICT-MMMO (test)
A2 Score86.3
15
Sentiment AnalysisYouTube (test)
A3 Score54.2
15
Sentiment AnalysisMOUD (test)
A281.1
15
Speaker personality trait recognitionPOM (test)
Confident (A^7)29.1
12
Showing 10 of 14 rows

Other info

Code

Follow for update