Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation

About

Multilingual translation suffers from computational redundancy, especially when translating into multiple languages simultaneously. In addition, translation quality can suffer for low-resource languages. To address this, we introduce Transformer Encoder Tree (TET), a hierarchical, non-autoregressive encoder-only architecture trained with Connectionist Temporal Classification (CTC) for multilingual translation. TET shares intermediate representations among linguistically similar target languages, improving accuracy on low-resource languages while reducing computational redundancy and enabling the generation of all target languages in a single forward pass. TET eliminates the sequential bottleneck of autoregressive models and supports fully parallel decoding of all tokens across all target languages. Compared to a naive one-to-many multilingual design, TET reduces the total parameter count by 66% and lowers inference computation by 60%. In speech translation, combining TET with a non-autoregressive speech recognition backbone (Wav2Vec2) shows competitive translation quality compared to autoregressive systems while speeding up inference by approximately 7-14 times.

Yiwen Guan, Jacob Whitehill• 2025

Related benchmarks

Task	Dataset	Result
Machine Translation	Tatoeba (test)	SacreBLEU Score (Da)29.5	8
Speech-to-text Translation	Tatoeba 4392 utterances	--	6
Machine Translation	Multi30K SacreBLEU COMET (test)	De SacreBLEU40.6	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord