Contrast and Generation Make BART a Good Dialogue Emotion Recognizer
About
In dialogue systems, utterances with similar semantics may have distinctive emotions under different contexts. Therefore, modeling long-range contextual emotional relationships with speaker dependency plays a crucial part in dialogue emotion recognition. Meanwhile, distinguishing the different emotion categories is non-trivial since they usually have semantically similar sentiments. To this end, we adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better. Meanwhile, we utilize an auxiliary response generation task to enhance the model's ability of handling context information, thereby forcing the model to recognize emotions with similar semantics in diverse contexts. To achieve these objectives, we use the pre-trained encoder-decoder model BART as our backbone model since it is very suitable for both understanding and generation tasks. The experiments on four datasets demonstrate that our proposed model obtains significantly more favorable results than the state-of-the-art model in dialogue emotion recognition. The ablation study further demonstrates the effectiveness of supervised contrastive loss and generative loss.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Emotion Recognition in Conversation | IEMOCAP (test) | Weighted Average F1 Score66.18 | 154 | |
| Emotion Recognition in Conversation | MELD | Weighted Avg F164.81 | 137 | |
| Conversational Emotion Recognition | IEMOCAP | Weighted Average F1 Score66.18 | 129 | |
| Emotion Recognition in Conversation | MELD (test) | Weighted F164.81 | 118 | |
| Emotion Detection | EmoryNLP (test) | Weighted-F10.3904 | 96 | |
| Dialogue Emotion Detection | EmoryNLP | Weighted Avg F139.04 | 80 | |
| Emotion Recognition | IEMOCAP | Accuracy66.71 | 71 | |
| Dialogue Emotion Detection | DailyDialog | Micro F1 (- neutral)0.5552 | 27 | |
| Emotion Recognition in Conversation | DailyDialog (test) | -- | 16 | |
| Emotion Recognition in Conversations | IEMOCAP (standard split) | Micro F164.1 | 11 |