Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

About

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.

Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino• 2020

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionLibriSpeech (test-other)
WER3.5
966
Automatic Speech RecognitionLibriSpeech clean (test)
WER1.7
833
Automatic Speech RecognitionLibriSpeech (dev-other)
WER3.5
411
Automatic Speech RecognitionLibriSpeech (dev-clean)
WER (%)1.7
319
Speech TranslationCoVoST-2 (test)--
46
Speech TranslationMuST-C EN-DE (test-COMMON)
BLEU22.8
41
Speech RecognitionMuST-C (test)
WER (Avg)32.6
30
Speech TranslationMuST-C (test)--
29
Speech TranslationMuST-C EN-FR COMMON (test)
BLEU32.9
17
Speech-to-text TranslationMuST-C En-X (tst-COM)
BLEU (German)22.7
16
Showing 10 of 34 rows

Other info

Code

Follow for update