Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition

About

Cross-lingual Speech Emotion Recognition (CLSER) aims to identify emotional states in unseen languages. However, existing methods heavily rely on the semantic synchrony of complete labels and static feature stability, hindering low-resource languages from reaching high-resource performance. To address this, we propose a semi-supervised framework based on Semantic-Emotional Resonance Embedding (SERE), a cross-lingual dynamic feature paradigm that requires neither target language labels nor translation alignment. Specifically, SERE constructs an emotion-semantic structure using a small number of labeled samples. It learns human emotional experiences through an Instantaneous Resonance Field (IRF), enabling unlabeled samples to self-organize into this structure. This achieves semi-supervised semantic guidance and structural discovery. Additionally, we design a Triple-Resonance Interaction Chain (TRIC) loss to enable the model to reinforce the interaction and embedding capabilities between labeled and unlabeled samples during emotional highlights. Extensive experiments across multiple languages demonstrate the effectiveness of our method, requiring only 5-shot labeling in the source language.

Ya Zhao, Yinfeng Yu, Liejun Wang• 2026

Related benchmarks

TaskDatasetResultRank
Speech Emotion RecognitionEmoDB to CASIA B→C
Average Recall48.68
13
Speech Emotion RecognitionCASIA to EmoDB C→B
Average Recall69.28
13
Speech Emotion RecognitionEmoDB to eNTERFACE B→E
Average Recall40.97
9
Speech Emotion RecognitioneNTERFACE to EmoDB E→B
Average Recall54.47
9
Speech Emotion RecognitionCASIA to eNTERFACE
Average Recall40.52
9
Speech Emotion RecognitioneNTERFACE to CASIA (E→C)
Average Recall40.05
9
Speech Emotion RecognitionEMOVO to CASIA O→C
Average Recall51.98
9
Speech Emotion RecognitionCASIA to EMOVO (C→O)
Average Recall48.55
9
Speech Emotion RecognitionEmoDB to EMOVO B→O
Average Recall49.86
9
Speech Emotion RecognitionEMOVO to EmoDB O→B
Average Recall58.43
9
Showing 10 of 12 rows

Other info

Follow for update