Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ESD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Emotional Text-to-SpeechESD (English)
SMOS4.35
16
Speech Emotion RecognitionESD In-Domain v1 (test)
ACC93.86
13
Object DetectionESD
AP46.5
13
Open-set speaker identificationESD (test)
EER0.61
12
Text-to-SpeechESD (test)
MOS4.47
11
Target Speaker ExtractionESD (test)
SI-SDRi (dB)16.67
8
Empathetic Response GenerationESD
Emotional Reaction1.851
8
Emotion Style TransferESD (test)
UTMOS3.93
7
Text-to-SpeechESD
MOS (Happy)3.87
6
Speech SynthesisESD Zh
WER2.4
5
Cross-speaker style transferESD (test)
nMOS3.638
5
Emotional Speech SynthesisESD English (test)
Score (Neutral)78.39
5
Text-to-SpeechESD English (test)
WER6.8
5
Speech Emotion RecognitionESD
UA98.9
5
Instance SegmentationESD-1 (test)
Accuracy (2 Objects)95
5
Voice ConversionESD
WER0.149
4
Chain GenerationESD-CoT (test)
B-144.87
3
Showing 17 of 17 rows