Unsupervised pretraining transfers well across languages
About
Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. This shows the potential of unsupervised methods for languages with few linguistic resources.
Morgane Rivi\`ere, Armand Joulin, Pierre-Emmanuel Mazar\'e, Emmanuel Dupoux• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Universal Speech Representation Evaluation | SUPERB Benchmark | SID Accuracy39.63 | 27 | |
| Phone recognition | LibriSpeech train-clean-100 (test) | Phone Accuracy83.2 | 14 | |
| Frame Classification | LibriSpeech train-clean-100 (test) | Frame Accuracy67.5 | 8 | |
| ABX Phone Discriminability | ZeroSpeech 2021 (dev-clean) | ABX Within-Speaker6.68 | 8 | |
| Phoneme Recognition | CommonVoice (test) | Phoneme Error Rate (es)38 | 7 |
Showing 5 of 5 rows