Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VCTK

Benchmarks

Task NameDataset NameSOTA ResultTrend
Phonetic TranscriptionVCTK++ (test)
F1 Score93
25
Voice ConversionVCTK
WER0
21
Speech Super-resolutionVCTK 0.92 (test)
LSD0.7
16
Audio Super-resolutionVCTK Multi-speaker (test)
SNR20
15
Audio Super-resolutionVCTK Single-speaker (test)
SNR19.5
15
Audio-to-Text RetrievalVCTK A→T
Recall@196.1
15
Pitch ShiftVCTK (10% unseen utterances)
MOS4.05
15
Time-scale modificationVCTK (10% unseen utterances)
MOS3.98
15
Text-to-SpeechVCTK
WER1.7
13
Speech Super-resolutionVCTK 16 kHz target sampling rate 0.92 (test)
LSD0.78
11
Neural VocodingVCTK 100 audio clips (unseen)
MAE0.0925
10
Speaker-IDVCTK (test)
Accuracy99.3
10
Voice ConversionVCTK (test)
nMOS4.26
9
Speech SynthesisVCTK (OD)
PESQ4.5
9
Text-to-SpeechVCTK (test)
MOS4.4
8
Neural VocodingVCTK (unseen speakers)
MOS4.37
8
Bandwidth ExtensionVCTK-BWE BW=2K (test)
WVMOS4.306
7
Speech SeparationVCTK 2 Speech
SI-SDR14.52
7
Audio Super-ResolutionVCTK 4 kHz input sampling rate (test)
WER1
7
Audio Super-ResolutionVCTK 2 kHz input sampling rate (test)
WER1
7
Speech ReconstructionVCTK subset
PESQ (WB)2.36
7
Dysfluency DetectionVCTK++
F1 Score90
7
Mel-spectrogram inversionVCTK (unseen speakers)
MOS3.79
7
Audio Super-ResolutionVCTK (test)
LSD2.1
7
Bandwidth ExtensionVCTK-BWE BW=1K (test)
WVMOS4.154
6
Showing 25 of 56 rows