Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VCTK

Benchmarks

Task NameDataset NameSOTA ResultTrend
Audio Super-ResolutionVCTK In-domain
LSD0.6
34
Speech DecompressionVCTK (test)
Log Spectral Distance1.01
28
Phonetic TranscriptionVCTK++ (test)
F1 Score93
25
Voice ConversionVCTK
WER0
21
Text-to-SpeechVCTK
WER1.7
19
Speech EnhancementVCTK Accelerometer 12-bit, 4-16 kHz upsampling (test)
LSD0.87
18
Speech EnhancementVCTK Vibration sensor 12-bit, 4-16 kHz upsampling (test)
LSD (Log-Spectral Distance)0.84
18
Speech Super-resolutionVCTK 0.92 (test)
LSD0.7
16
Speech Bandwidth ExtensionVCTK English
NISQA-MOS4.53
15
Automatic Speech RecognitionVCTK (test)
WER3.47
15
Audio Super-resolutionVCTK Multi-speaker (test)
SNR20
15
Audio Super-resolutionVCTK Single-speaker (test)
SNR19.5
15
Audio-to-Text RetrievalVCTK A→T
Recall@196.1
15
Pitch ShiftVCTK (10% unseen utterances)
MOS4.05
15
Time-scale modificationVCTK (10% unseen utterances)
MOS3.98
15
Speech Quality EvaluationVCTK 48 kHz (test)
STOI0.895
12
Speech CodingVCTK 48 kHz (test)
RTF (CPU)0.142
12
Speech Bandwidth ExtensionVCTK noisy (test)
NISQA-MOS3.89
12
Audio Super-ResolutionVCTK 24 kHz (test)
LSD0.74
11
Speech Super-resolutionVCTK 16 kHz target sampling rate 0.92 (test)
LSD0.78
11
Bandwidth Extension (BWE)VCTK Google Pixel7
LSD0.84
10
Bandwidth Extension (BWE)VCTK Desktop
LSD0.82
10
Bandwidth ExtensionVCTK 8 kHz to 44.1 kHz (test)
VISQOL4.73
10
Neural VocodingVCTK 100 audio clips (unseen)
MAE0.0925
10
Speaker-IDVCTK (test)
Accuracy99.3
10
Showing 25 of 90 rows