ESPnet-ST: All-in-One Speech Translation Toolkit
About
We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework. ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation. We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines for a wide range of benchmark datasets. Our reproducible results can match or even outperform the current state-of-the-art performances; these pre-trained models are downloadable. The toolkit is publicly available at https://github.com/espnet/espnet.
Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Speech-to-speech translation | Fisher Spanish-English (test) | -- | 55 | |
| Speech Translation | MuST-C EN-DE (test-COMMON) | BLEU22.9 | 41 | |
| Simultaneous Speech Translation | CallHome Spanish-English Es-En (test) | BLEU19.4 | 18 | |
| Speech Translation | MuST-C EN-FR COMMON (test) | BLEU32.8 | 17 | |
| Speech-to-text Translation | MuST-C En-X (tst-COM) | BLEU (German)23.6 | 16 | |
| Speech Translation | libri-trans (test) | Detokenized BLEU (case-sensitive)17 | 14 | |
| Speech Translation | MuST-C EN-ES (tst-COMMON) | BLEU28 | 14 | |
| Speech Translation | MuST-C COMMON (tst) | WER (de)22.9 | 13 | |
| Speech Translation | MuST-C en-nl (tst-COMMON) | BLEU Score27.4 | 6 | |
| Speech Recognition | MuST-C COMMON (test) | WER12 | 5 |
Showing 10 of 11 rows