Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Ruri: Japanese General Text Embeddings

About

We report the development of Ruri, a series of Japanese general text embedding models. While the development of general-purpose text embedding models in English and multilingual contexts has been active in recent years, model development in Japanese remains insufficient. The primary reasons for this are the lack of datasets and the absence of necessary expertise. In this report, we provide a detailed account of the development process of Ruri. Specifically, we discuss the training of embedding models using synthesized datasets generated by LLMs, the construction of the reranker for dataset filtering and knowledge distillation, and the performance evaluation of the resulting general-purpose text embedding models.

Hayato Tsukagoshi, Ryohei Sasano• 2024

Related benchmarks

TaskDatasetResultRank
Information RetrievalPast marketplace search logs (test)
nDCG@k0.198
2
Showing 1 of 1 rows

Other info

Follow for update