Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

About

Usually bilingual word vectors are trained "online". Mikolov et al. showed they can also be found "offline", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a "pseudo-dictionary" from the identical character strings which appear in both languages, achieving 40% precision on the same test set. Finally, we extend our method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68%.

Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla• 2017

Related benchmarks

TaskDatasetResultRank
Word TranslationWord Translation ru-en fastText Wikipedia (test)
Precision@163.8
7
Word TranslationWord Translation zh-en fastText Wikipedia (test)
P@137.5
7
Word TranslationWord Translation eo-en fastText Wikipedia (test)
P@127.9
7
Word Translationen-fr Word Translation fastText Wikipedia (test)
P@181.1
7
Word Translationen-ru Word Translation fastText Wikipedia (test)
P@149.5
7
Word Translationen-eo Word Translation fastText Wikipedia (test)
P@129
7
Sentence translation retrievalEuroparl English to Italian (test)
P@154.6
7
Word TranslationWaCky English-to-Italian 1,500 query source words (test)
P@143.1
7
Word TranslationWaCky Italian-to-English 1,500 query source words (test)
P@138
7
Word Translationen-es Word Translation fastText Wikipedia (test)
P@181.1
7
Showing 10 of 17 rows

Other info

Follow for update