Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Joint Multilingual Sentence Representations with Neural Machine Translation

About

In this paper, we use the framework of neural machine translation to learn joint sentence representations across six very different languages. Our aim is that a representation which is independent of the language, is likely to capture the underlying semantics. We define a new cross-lingual similarity measure, compare up to 1.4M sentence representations and study the characteristics of close sentences. We provide experimental evidence that sentences that are close in embedding space are indeed semantically highly related, but often have quite different structure and syntax. These relations also hold when comparing sentences in different languages.

Holger Schwenk, Matthijs Douze• 2017

Related benchmarks

TaskDatasetResultRank
Semantic SimilaritySemantic Similarity Cross-lingual XL
Pearson Correlation Coefficient0.78
24
Multi-task EvaluationAggregate All tasks (summary)
Score64.9
20
Question RetrievalNQ (Natural Questions) (full)
Retrieval Accuracy40.9
12
Bitext MiningBUCC (full)
F1 (Cosine Similarity)85.9
12
Question RetrievalMKQA (full)
Retrieval Accuracy29.6
12
Semantic SimilaritySemantic Similarity English-only
Pearson's r74
12
Semantic SimilaritySemantic Similarity Cross-lingual same language XL s.
Pearson's r0.798
12
Bitext MiningTatoeba (full)
Accuracy78.2
12
Showing 8 of 8 rows

Other info

Follow for update