Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition

About

Cross-corpus speech emotion recognition (SER) aims to transfer emotional knowledge from a labeled source corpus to an unlabeled corpus. However, prior methods require access to source data during adaptation, which is unattainable in real-life scenarios due to data privacy protection concerns. This paper tackles a more practical task, namely source-free cross-corpus SER, where a pre-trained source model is adapted to the target domain without access to source data. To address the problem, we propose a novel method called emotion-aware contrastive adaptation network (ECAN). The core idea is to capture local neighborhood information between samples while considering the global class-level adaptation. Specifically, we propose a nearest neighbor contrastive learning to promote local emotion consistency among features of highly similar samples. Furthermore, relying solely on nearest neighborhoods may lead to ambiguous boundaries between clusters. Thus, we incorporate supervised contrastive learning to encourage greater separation between clusters representing different emotions, thereby facilitating improved class-level adaptation. Extensive experiments indicate that our proposed ECAN significantly outperforms state-of-the-art methods under the source-free cross-corpus SER setting on several speech emotion corpora.

Yan Zhao, Jincen Wang, Cheng Lu, Sunan Li, Bj\"orn Schuller, Yuan Zong, Wenming Zheng• 2024

Related benchmarks

TaskDatasetResultRank
Speech Emotion RecognitionCASIA to EmoDB C→B
Average Recall61.37
13
Speech Emotion RecognitionEmoDB to CASIA B→C
Average Recall39
13
Speech Emotion RecognitionEmoDB to eNTERFACE B→E
Average Recall34.21
9
Speech Emotion RecognitionCASIA to eNTERFACE
Average Recall34.53
9
Speech Emotion RecognitioneNTERFACE to CASIA (E→C)
Average Recall31.9
9
Speech Emotion RecognitionEmoDB to EMOVO B→O
Average Recall36.51
9
Speech Emotion RecognitionCASIA to EMOVO (C→O)
Average Recall35.91
9
Speech Emotion RecognitioneNTERFACE to EmoDB E→B
Average Recall46.87
9
Speech Emotion RecognitionEMOVO to CASIA O→C
Average Recall27.42
9
Speech Emotion RecognitionEMOVO to EmoDB O→B
Average Recall40.86
9
Showing 10 of 12 rows

Other info

Follow for update