Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

About

Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions. Despite the recent progress in SER, state-of-the-art models struggle to achieve a satisfactory performance. We propose a self-attention based method with combined use of label-adaptive mixup and center loss. By adapting label probabilities in mixup and fitting center loss to the mixup training scheme, our proposed method achieves a superior performance to the state-of-the-art methods.

Lei Kang, Lichao Zhang, Dazhi Jiang• 2023

Related benchmarks

TaskDatasetResultRank
Speech Emotion RecognitionIEMOCAP Speaker-independent 5-fold cross-validation
WA75.37
19
Emotion RecognitionIEMOCAP full-modality
Weighted Accuracy75.4
9
Multimodal Emotion RecognitionIEMOCAP full-modality comparison
Weighted Accuracy75.4
9
Showing 3 of 3 rows

Other info

Follow for update