Dual-branch Graph Domain Adaptation for Cross-scenario Multi-modal Emotion Recognition
About
Multimodal Emotion Recognition in Conversations (MERC) aims to predict speakers' emotional states in multi-turn dialogues through text, audio, and visual cues. In real-world settings, conversation scenarios differ significantly in speakers, topics, styles, and noise levels. Existing MERC methods generally neglect these cross-scenario variations, limiting their ability to transfer models trained on a source domain to unseen target domains. To address this issue, we propose a Dual-branch Graph Domain Adaptation framework (DGDA) for multimodal emotion recognition under cross-scenario conditions. We first construct an emotion interaction graph to characterize complex emotional dependencies among utterances. A dual-branch encoder, consisting of a hypergraph neural network (HGNN) and a path neural network (PathNN), is then designed to explicitly model multivariate relationships and implicitly capture global dependencies. To enable out-of-domain generalization, a domain adversarial discriminator is introduced to learn invariant representations across domains. Furthermore, a regularization loss is incorporated to suppress the negative influence of noisy labels. To the best of our knowledge, DGDA is the first MERC framework that jointly addresses domain shift and label noise. Theoretical analysis provides tighter generalization bounds, and extensive experiments on IEMOCAP and MELD demonstrate that DGDA consistently outperforms strong baselines and better adapts to cross-scenario conversations. Our code is available at https://github.com/Xudmm1239439/DGDA-Net.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cross-scenario Multimodal Emotion Recognition | IEMOCAP -> MELD 20% Noise (test) | Joy Score54.97 | 15 | |
| Cross-scenario Multimodal Emotion Recognition | MELD -> IEMOCAP 20% Noise (test) | Joy Accuracy59.6 | 15 | |
| Multimodal Emotion Recognition in Conversations | IEMOCAP → MELD (target) | Joy Score56.21 | 15 | |
| Multimodal Emotion Recognition in Conversations | MELD → IEMOCAP target | Joy Accuracy61.04 | 15 | |
| Cross-scenario Multimodal Emotion Recognition in Conversations | IEMOCAP -> MELD noise rate 40% (test) | Joy Accuracy37.85 | 15 | |
| Cross-scenario Multimodal Emotion Recognition in Conversations | MELD -> IEMOCAP noise rate 40% (test) | Joy Accuracy35.58 | 15 |