ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

About

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\'ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe• 2023

Related benchmarks

Task	Dataset	Result
Simultaneous Speech Translation	MuST-C EN-DE (tst-COMMON)	BLEU23.5	39
Speech-to-speech translation	CVSS-C Es-En v1 (test)	ASR-BLEU32	8
Streaming Speech-to-Text Translation	IWSLT En-De tst-COMMON 2022 v2 (test)	BLEU26.6	5
Offline Speech Translation	MuST-C v1 (test)	BLEU (DE)27.9	4
Simultaneous Speech Translation	MuST-C v1 (test)	BLEU (DE)23.5	2
Offline Speech-to-Speech Translation	CVSS-C v1 (test)	ASR-BLEU (DE)23.7	2

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord