Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

About

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.

Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Fran\c{c}oise Beaufays, Yonghui Wu• 2023

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech (test-other)
WER5.2
966
Automatic Speech RecognitionLibriSpeech clean (test)
WER2.7
833
Speech TranslationCoVoST-2 (test)
Avg BLEU (15 Dir)30.7
46
Automatic Speech RecognitionAISHELL-1 1.0 (test)
CER (Offline, Rescoring)5.31
7
Automatic Speech RecognitionEnglish Hardcase (test)
F1 Score63.3
7
Four-way emotion classificationIEMOCAP (leave-one-session-out five-fold cross val)
ACC71.06
5
Automatic Speech RecognitionEnglish Multi-domain (val)
WER9.33
4
Automatic Speech RecognitionMLS
WER (ES)4.2
4
Automatic Speech RecognitionEnglish Multi-accent (evaluation set)
WER22.19
4
Automatic Speech RecognitionMultilingual Multi-domain (evaluation set)
WER21.51
3
Showing 10 of 11 rows

Other info

Follow for update