Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ESD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Emotional Text-to-SpeechESD (English)
SMOS4.35
16
Text-to-SpeechESD
Score (Angry)75
15
Neural Audio CompressionESD
ViSQOL4.61
13
Emotion PreservationESD
MEDR2.2
13
Speech Emotion RecognitionESD In-Domain v1 (test)
ACC93.86
13
Object DetectionESD
AP46.5
13
Open-set speaker identificationESD (test)
EER0.61
12
Text-to-SpeechESD (test)
MOS4.47
11
Target Speaker ExtractionESD (test)
SI-SDRi (dB)16.67
8
Empathetic Response GenerationESD
Emotional Reaction1.851
8
Emotion Style TransferESD (test)
UTMOS3.93
7
Speech SynthesisESD Zh
WER2.4
5
Cross-speaker style transferESD (test)
nMOS3.638
5
Emotional Speech SynthesisESD English (test)
Score (Neutral)78.39
5
Text-to-SpeechESD English (test)
WER6.8
5
Speech Emotion RecognitionESD
UA98.9
5
Instance SegmentationESD-1 (test)
Accuracy (2 Objects)95
5
Voice ConversionESD
WER0.149
4
Chain GenerationESD-CoT (test)
B-144.87
3
Emotion and Style Control Speech GenerationESD
ESD Score82.1
2
Showing 20 of 20 rows