Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

About

Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as measured by Normalized Streaming Efficiency (NoSE) scores up to 7.1% over existing competitive baselines.

Joseph Liu, Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana• 2026

Related benchmarks

TaskDatasetResultRank
Simultaneous Speech TranslationFLEURS de-en (test)
NoSE Score99.1
4
Simultaneous Speech TranslationFLEURS fr-en (test)
NoSE98.5
4
Simultaneous Speech TranslationFLEURS es-en (test)
NoSE Score97.5
4
Showing 3 of 3 rows

Other info

Follow for update