S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation
About
Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Emotion Recognition in Conversation | IEMOCAP (test) | Weighted Average F1 Score68.93 | 154 | |
| Emotion Recognition in Conversation | MELD (test) | Weighted F164.67 | 118 | |
| Emotion Detection | EmoryNLP (test) | Weighted-F10.4005 | 96 | |
| Emotion Recognition in Conversation | DailyDialog (test) | -- | 16 |