Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Attention with Relative Position Representations

About

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

Peter Shaw, Jakob Uszkoreit, Ashish Vaswani• 2018

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT En-De 2014 (test)
BLEU29.2
379
Machine TranslationWMT En-Fr 2014 (test)
BLEU41.5
237
Machine TranslationWMT English-German 2014 (test)
BLEU29.2
136
Machine TranslationWMT14 En-De newstest2014 (test)
BLEU29.2
65
Machine TranslationWMT en-fr 14
BLEU Score41.5
56
Machine TranslationWMT En-Fr newstest 2014 (test)
BLEU41.5
46
Machine TranslationWMT English-French 2014 (test)
BLEU41.5
41
Machine TranslationWMT14 English-French (newstest2014)
BLEU41.5
39
Image ClassificationImageNet 4 (val)
Accuracy80.9
30
Music ModelingJ.S. Bach Chorales 16th notes (val)
Validation NLL0.357
25
Showing 10 of 13 rows

Other info

Code

Follow for update