Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Joint Source-Target Self Attention with Locality Constraints

About

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.

Jos\'e A. R. Fonollosa, Noe Casas, Marta R. Costa-juss\`a• 2019

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De 2014 (test)
BLEU29.7
379
Machine TranslationWMT En-Fr 2014 (test)
BLEU43.3
237
Machine TranslationIWSLT De-En 2014 (test)
BLEU35.7
146
Machine TranslationWMT en-fr 14
BLEU Score43.3
56
Showing 4 of 4 rows

Other info

Code

Follow for update