Speech Emotion Recognition with ASR Integration

About

Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.

Yuanchao Li• 2026

Related benchmarks

Task	Dataset	Result
Multimodal Sentiment Analysis	CMU-MOSI	--	166
Emotion Recognition	IEMOCAP	--	151
Sentiment Analysis	CMU-MOSI	--	54
Humor Detection	UR-FUNNY	ACC275.09	20
Emotion Recognition	CMU-MOSEI	F1 Score84	19
Sarcasm Detection	MUSTARD	Accuracy76.62	13
Dementia detection	Dementia detection dataset (train)	Unweighted Average Accuracy80.87	9
Emotion Recognition	Emotion recognition dataset (train)	UA (%)75.1	9
ASR Error Correction	ASR Error Correction Evaluation Set (test)	WER16.07	6
Speech Emotion Recognition	MSP-Podcast	WER12.85	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord