Speech Emotion Recognition with ASR Integration
About
Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.
Yuanchao Li• 2026
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Emotion Recognition | IEMOCAP | -- | 71 | |
| Multimodal Sentiment Analysis | CMU-MOSI | MAE0.8557 | 59 | |
| Sentiment Analysis | CMU-MOSI | Accuracy (2-class)85.1 | 21 | |
| Humor Detection | UR-FUNNY | ACC275.09 | 20 | |
| Emotion Recognition | CMU-MOSEI | F1 Score84 | 19 | |
| Sarcasm Detection | MUSTARD | Accuracy76.62 | 13 | |
| Dementia detection | Dementia detection dataset (train) | Unweighted Average Accuracy80.87 | 9 | |
| Emotion Recognition | Emotion recognition dataset (train) | UA (%)75.1 | 9 | |
| ASR Error Correction | ASR Error Correction Evaluation Set (test) | WER16.07 | 6 | |
| Speech Emotion Recognition | MSP-Podcast | WER12.85 | 3 |
Showing 10 of 10 rows