Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

About

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech mapping. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new state-of-the-art result on the Fisher English-to-Spanish test set. Indeed, we exploit the pseudo data with a combination of popular techniques which are not trivial when applied to S2ST. Moreover, we evaluate our approach on both syntactically similar (Spanish-English) and distant (English-Chinese) language pairs. Our implementation is available at https://github.com/fengpeng-yue/speech-to-speech-translation.

Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang• 2022

Related benchmarks

TaskDatasetResultRank
Speech-to-speech translationFisher Spanish-English (test)
BLEU (Speech Input)46.3
55
Speech-to-speech translationFisher Spanish-English (dev)
BLEU (Speech)45.5
48
Speech-to-speech translationCVSS-C
Avg Score0.273
38
Speech-to-speech translationFisher Spanish-English (dev2)
ASR BLEU47.6
36
Speech-to-speech translationFisher Es→En (dev)
ASR chrF63.8
10
Speech-to-speech translationFisher Es→En (test)
ASR chrF64.9
10
Showing 6 of 6 rows

Other info

Follow for update