Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

About

A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates multiple pieces of knowledge from different languages into the context may fail to improve effectiveness due to the potential disparities across languages. To better leverage multilingual knowledge, we propose CroSearch-R1, a search-augmented reinforcement learning framework to integrate multilingual knowledge into the Group Relative Policy Optimization (GRPO) process. In particular, the approach adopts a multi-turn retrieval strategy with cross-lingual knowledge integration to dynamically align the knowledge from other languages as supplementary evidence into a unified representation space. Furthermore, we introduce a multilingual rollout mechanism to optimize reasoning transferability across languages. Experimental results demonstrate that our framework effectively leverages cross-lingual complementarity and improves the effectiveness of RAG with multilingual collections.

Rui Qi, Fengran Mo, Sijin Lu, Yufeng Chen, Jian-Yun Nie, Kaiyu Huang• 2026

Related benchmarks

TaskDatasetResultRank
Cross-lingual Question AnsweringMKQA English
fEM72.07
14
Cross-lingual Question AnsweringMKQA French
fEM59.67
14
Cross-lingual Question AnsweringMKQA Thai
fEM27.83
14
Cross-lingual Question AnsweringMKQA Arabic
fEM24.12
14
Cross-lingual Question AnsweringMKQA Average across languages
fEM45.92
14
Monolingual Question AnsweringPopQA
fEM65.04
14
Monolingual Question Answering2Wiki
fEM61.13
14
Monolingual Question AnsweringEnglish Datasets Average
fEM55.53
14
Monolingual Question AnsweringHotpotQA
fEM44.14
14
Showing 9 of 9 rows

Other info

Follow for update