Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CTC-based Compression for Direct Speech Translation

About

Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method able to perform a dynamic compression of the input indirect ST models. In particular, we exploit the Connectionist Temporal Classification (CTC) to compress the input sequence according to its phonetic characteristics. Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement over a strong baseline on two language pairs (English-Italian and English-German), contextually reducing the memory footprint by more than 10%.

Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi• 2021

Related benchmarks

TaskDatasetResultRank
Speech TranslationMuST-C EN-DE (test-COMMON)
BLEU22.8
41
Speech TranslationMuST-C EN-ES (tst-COMMON)
BLEU27.9
14
Speech TranslationMuST-C en-de (dev)
BLEU22.3
14
Speech TranslationMuST-C en-nl (tst-COMMON)
BLEU Score27
6
Speech TranslationMuST-C en-es (dev)
BLEU0.311
4
Speech TranslationMuST-C en-nl (dev)
BLEU24.2
4
Showing 6 of 6 rows

Other info

Follow for update