Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised pretraining transfers well across languages

About

Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. This shows the potential of unsupervised methods for languages with few linguistic resources.

Morgane Rivi\`ere, Armand Joulin, Pierre-Emmanuel Mazar\'e, Emmanuel Dupoux• 2020

Related benchmarks

TaskDatasetResultRank
Universal Speech Representation EvaluationSUPERB Benchmark
SID Accuracy39.63
27
Phone recognitionLibriSpeech train-clean-100 (test)
Phone Accuracy83.2
14
Frame ClassificationLibriSpeech train-clean-100 (test)
Frame Accuracy67.5
8
ABX Phone DiscriminabilityZeroSpeech 2021 (dev-clean)
ABX Within-Speaker6.68
8
Phoneme RecognitionCommonVoice (test)
Phoneme Error Rate (es)38
7
Showing 5 of 5 rows

Other info

Follow for update