Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

About

Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.

Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, Qi Su• 2018

Related benchmarks

Task	Dataset	Result
Machine Translation (Chinese-to-English)	NIST 2003 (MT-03)	BLEU38.16	52
Machine Translation (Chinese-to-English)	NIST MT-05 2005	BLEU36.81	42
Machine Translation	NIST MT 04 2004 (test)	BLEU0.4048	27
Machine Translation	NIST MT 06 2006 (test)	BLEU35.95	27
Machine Translation	IWSLT English-Vietnamese 2015 (tst2013)	BLEU29.12	23
Machine Translation	IWSLT En-Vi 2015 (test)	BLEU29.1	17
Machine Translation	NIST 03-06 Average (test)	BLEU37.85	6

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord