Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimulU: Training-free Policy for Long-form Simultaneous Speech-to-Speech Translation

About

Simultaneous speech-to-speech translation (SimulS2S) is essential for real-time multilingual communication, with increasing integration into meeting and streaming platforms. Despite this, SimulS2S remains underexplored in research, where current solutions often rely on resource-intensive training procedures and operate on short-form, pre-segmented utterances, failing to generalize to continuous speech. To bridge this gap, we propose SimulU, the first training-free policy for long-form SimulS2S. SimulU adopts history management and speech output selection strategies that exploit cross-attention in pre-trained end-to-end models to regulate both input history and output generation. Evaluations on MuST-C across 8 languages show that SimulU achieves a better or comparable quality-latency trade-off against strong cascaded models. By eliminating the need for ad-hoc training, SimulU offers a promising path to end-to-end SimulS2S in realistic, long-form scenarios.

Amirbek Djanibekov, Luisa Bentivogli, Matteo Negri, Sara Papi• 2026

Related benchmarks

TaskDatasetResultRank
Simultaneous Speech TranslationMuST-C EN-DE (tst-COMMON)--
39
Simultaneous Speech-to-Speech TranslationMuST-C en-fr v1 (tst-COMMON)
End Offset Mean (ms)224
2
Simultaneous Speech-to-Speech TranslationMuST-C en-it v1 (tst-COMMON)
End Offset Mean (ms)100
2
Simultaneous Speech-to-Speech TranslationMuST-C en-pt v1 (tst-COMMON)
End Offset Mean (ms)146
2
Simultaneous Speech-to-Speech TranslationMuST-C en-es v1 (tst-COMMON)
End Offset Mean (ms)106
2
Simultaneous Speech-to-Speech TranslationMuST-C en-ro v1 (tst-COMMON)
End Offset Mean Latency (ms)34
2
Simultaneous Speech-to-Speech TranslationMuST-C en-nl v1 (tst-COMMON)
End Offset Mean (ms)82
2
Simultaneous Speech-to-Speech TranslationMuST-C en-ru v1 (tst-COMMON)
End Offset Mean Latency (ms)106
2
Showing 8 of 8 rows

Other info

Follow for update