fairseq S2T: Fast Speech-to-Text Modeling with fairseq
About
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based, Transformer-based as well as Conformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text.
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | LibriSpeech (test-other) | WER3.5 | 966 | |
| Automatic Speech Recognition | LibriSpeech clean (test) | WER1.7 | 833 | |
| Automatic Speech Recognition | LibriSpeech (dev-other) | WER3.5 | 411 | |
| Automatic Speech Recognition | LibriSpeech (dev-clean) | WER (%)1.7 | 319 | |
| Speech Translation | CoVoST-2 (test) | -- | 46 | |
| Speech Translation | MuST-C EN-DE (test-COMMON) | BLEU22.8 | 41 | |
| Speech Recognition | MuST-C (test) | WER (Avg)32.6 | 30 | |
| Speech Translation | MuST-C (test) | -- | 29 | |
| Speech Translation | MuST-C EN-FR COMMON (test) | BLEU32.9 | 17 | |
| Speech-to-text Translation | MuST-C En-X (tst-COM) | BLEU (German)22.7 | 16 |
Showing 10 of 34 rows