Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VCTK

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio Super-ResolutionVCTK In-domain
LSD0.6
34
Speech DecompressionVCTK (test)
Log Spectral Distance1.01
28
Phonetic TranscriptionVCTK++ (test)
F1 Score93
25
Voice ConversionVCTK
WER0
21
Text-to-SpeechVCTK
WER1.7
19
Speech EnhancementVCTK Accelerometer 12-bit, 4-16 kHz upsampling (test)
LSD0.87
18
Speech EnhancementVCTK Vibration sensor 12-bit, 4-16 kHz upsampling (test)
LSD (Log-Spectral Distance)0.84
18
Speech Super-resolutionVCTK 0.92 (test)
LSD0.7
16
Automatic Speech RecognitionVCTK (test)
WER3.47
15
Audio Super-resolutionVCTK Multi-speaker (test)
SNR20
15
Audio Super-resolutionVCTK Single-speaker (test)
SNR19.5
15
Audio-to-Text RetrievalVCTK A→T
Recall@196.1
15
Pitch ShiftVCTK (10% unseen utterances)
MOS4.05
15
Time-scale modificationVCTK (10% unseen utterances)
MOS3.98
15
Speech Super-resolutionVCTK 16 kHz target sampling rate 0.92 (test)
LSD0.78
11
Bandwidth Extension (BWE)VCTK Google Pixel7
LSD0.84
10
Bandwidth Extension (BWE)VCTK Desktop
LSD0.82
10
Bandwidth ExtensionVCTK 8 kHz to 44.1 kHz (test)
VISQOL4.73
10
Neural VocodingVCTK 100 audio clips (unseen)
MAE0.0925
10
Speaker-IDVCTK (test)
Accuracy99.3
10
Neural VocodingVCTK English Corpus with Unseen Speakers (out-of-domain)
UTMOS4.117
9
Voice ConversionVCTK (test)
nMOS4.26
9
Speech SynthesisVCTK (OD)
PESQ4.5
9
Text-to-SpeechVCTK (test)
MOS4.4
8
Neural VocodingVCTK (unseen speakers)
MOS4.37
8
Showing 25 of 80 rows