Confidence through Attention
About
Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens. In this work, we use attention distributions as a confidence metric for output translations. We present two strategies of using the attention distributions: filtering out bad translations from a large back-translated corpus, and selecting the best translation in a hybrid setup of two different translation systems. While manual evaluation indicated only a weak correlation between our confidence score and human judgments, the use-cases showed improvements of up to 2.22 BLEU points for filtering and 0.99 points for hybrid translation, tested on English<->German and English<->Latvian translation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Translation (De-En) | news 200 random sentences 2017 (dev) | BLEU Score35.19 | 4 | |
| Machine Translation (De-En) | newstest 2017 (test) | BLEU29.47 | 4 | |
| Machine Translation (En-De) | news 200 random sentences 2017 (dev) | BLEU30.19 | 4 | |
| Machine Translation (Lv-En) | news 200 random sentences 2017 (dev) | BLEU11.23 | 4 | |
| Machine Translation (Lv-En) | news 2017 (test) | BLEU0.1483 | 4 | |
| Machine Translation (En-De) | newstest 2017 (test) | BLEU23.16 | 4 | |
| Machine Translation (En-Lv) | news 2017 (dev) | BLEU14.79 | 4 | |
| Machine Translation (Lv-En) | news 2017 (dev) | BLEU12.65 | 4 | |
| Machine Translation (De-En) | news 2017 (dev) | BLEU27.06 | 4 | |
| Machine Translation (En-De) | news 2017 (dev) | BLEU Score20.19 | 4 |