ESD

Benchmarks

Task Name	Dataset Name	SOTA Result
Speech Emotion Recognition	ESD EN	Accuracy93.18	24
Speech Emotion Recognition	ESD CN	Accuracy95.54	24
Emotional Text-to-Speech	ESD (English)	SMOS4.35	16
Text-to-Speech	ESD	Score (Angry)75	15
Text-to-Speech	ESD (test)	MOS4.47	15
Neural Audio Compression	ESD	ViSQOL4.61	13
Emotion Preservation	ESD	MEDR2.2	13
Speech Emotion Recognition	ESD In-Domain v1 (test)	ACC93.86	13
Object Detection	ESD	AP46.5	13
Cross-Speaker Emotional Transfer	ESD EN held-out targets {0013, 0019} (test)	EECS95.7	12
Open-set speaker identification	ESD (test)	EER0.61	12
Target Speaker Extraction	ESD (test)	SI-SDRi (dB)16.67	8
Empathetic Response Generation	ESD	Emotional Reaction1.851	8
Speaker and Emotion Editing	ESD (test)	CER2.433	7
Emotional Text-to-Speech	ESD plus (test)	WER4.15	7
Emotion Style Transfer	ESD (test)	UTMOS3.93	7
Speech Synthesis	ESD Zh	WER2.4	5
Cross-speaker style transfer	ESD (test)	nMOS3.638	5
Emotional Speech Synthesis	ESD English (test)	Score (Neutral)78.39	5
Text-to-Speech	ESD English (test)	WER6.8	5
Speech Emotion Recognition	ESD	UA98.9	5
Instance Segmentation	ESD-1 (test)	Accuracy (2 Objects)95	5
Voice Conversion	ESD	WER0.149	4
Emotional Speech Synthesis	ESD-plus	Speech Quality (Win Rate)94.29	3
Chain Generation	ESD-CoT (test)	B-144.87	3

Showing 25 of 29 rows