Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

About

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\'ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe• 2023

Related benchmarks

TaskDatasetResultRank
Simultaneous Speech TranslationMuST-C EN-DE (tst-COMMON)
BLEU23.5
39
Speech-to-speech translationCVSS-C Es-En v1 (test)
ASR-BLEU32
8
Streaming Speech-to-Text TranslationIWSLT En-De tst-COMMON 2022 v2 (test)
BLEU26.6
5
Offline Speech TranslationMuST-C v1 (test)
BLEU (DE)27.9
4
Simultaneous Speech TranslationMuST-C v1 (test)
BLEU (DE)23.5
2
Offline Speech-to-Speech TranslationCVSS-C v1 (test)
ASR-BLEU (DE)23.7
2
Showing 6 of 6 rows

Other info

Code

Follow for update