Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multilingual Universal Sentence Encoder for Semantic Retrieval

About

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task trained dual-encoder that learns tied representations using translation based bridge tasks (Chidambaram al., 2018). The models provide performance that is competitive with the state-of-the-art on: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On English transfer learning tasks, our sentence-level embeddings approach, and in some cases exceed, the performance of monolingual, English only, sentence embedding models. Our models are made available for download on TensorFlow Hub.

Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil• 2019

Related benchmarks

TaskDatasetResultRank
Paraphrase IdentificationTwitterPara (test)
TURL77.1
22
Question RetrievalCQADupStack (dev)
Average Precision0.159
22
Question RetrievalAskUbuntu (dev)
AP59.3
22
Scientific Document RetrievalSciDocs (dev)
Cite67.1
22
Intent DetectionBANKING 10-shot (test)
Accuracy84.23
16
Intent DetectionHWU 10-shot (test)
Accuracy83.75
16
Intent DetectionCLINC 10-shot (test)
Accuracy90.85
16
Question Response PairingBBAI 19 agents 1.0 (test)
Accuracy71.66
15
Retrieval Question AnsweringSQuAD
MRR62.5
14
Sentence-level retrievalReQA NQ (test)
MRR58.2
13
Showing 10 of 42 rows

Other info

Follow for update