Unsupervised pretraining transfers well across languages

About

Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. This shows the potential of unsupervised methods for languages with few linguistic resources.

Morgane Rivi\`ere, Armand Joulin, Pierre-Emmanuel Mazar\'e, Emmanuel Dupoux• 2020

Related benchmarks

Task	Dataset	Result
Universal Speech Representation Evaluation	SUPERB Benchmark	Overall Score56.9	60
Phone recognition	LibriSpeech train-clean-100 (test)	Phone Accuracy83.2	14
Frame Classification	LibriSpeech train-clean-100 (test)	Frame Accuracy67.5	8
ABX Phone Discriminability	ZeroSpeech 2021 (dev-clean)	ABX Within-Speaker6.68	8
Phoneme Recognition	CommonVoice (test)	Phoneme Error Rate (es)38	7

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord