Memory Fusion Network for Multi-view Sequential Learning
About
Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the existing multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for these multi-view datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | CMU-MOSEI (test) | F1 Score78.9 | 206 | |
| Emotion Recognition in Conversation | IEMOCAP (test) | Weighted Average F1 Score61.6 | 154 | |
| Conversational Emotion Recognition | IEMOCAP | Weighted Average F1 Score60.32 | 129 | |
| Emotion Recognition in Conversation | MELD (test) | Weighted F157.8 | 118 | |
| Emotion Recognition | IEMOCAP | -- | 71 | |
| Multimodal Sentiment Analysis | CMU-MOSI | MAE0.965 | 59 | |
| Emotion Classification | IEMOCAP (test) | -- | 36 | |
| Emotion Detection | MELD (test) | -- | 32 | |
| Multimodal Sentiment Analysis | CH-SIMS V2 | Accuracy (2-Class)79.4 | 29 | |
| Multimodal Emotion Recognition | IEMOCAP 6-way | F1 (Avg)59.9 | 28 |