GainRAG: Preference Alignment in Retrieval-Augmented Generation through Gain Signal Synthesis

About

The Retrieval-Augmented Generation (RAG) framework introduces a retrieval module to dynamically inject retrieved information into the input context of large language models (LLMs), and has demonstrated significant success in various NLP tasks. However, the current study points out that there is a preference gap between retrievers and LLMs in the RAG framework, which limit the further improvement of system performance. Some highly relevant passages may interfere with LLM reasoning because they contain complex or contradictory information; while some indirectly related or even inaccurate content may help LLM generate more accurate answers by providing suggestive information or logical clues. To solve this, we propose GainRAG, a novel approach that aligns the retriever's and LLM's preferences by defining a new metric, "gain", which measure how well an input passage contributes to correct outputs. Specifically, we propose a method to estimate these gain signals and train a middleware that aligns the preferences of the retriever and the LLM using only limited data. In addition, we introduce a pseudo-passage strategy to mitigate degradation. The experimental results on 6 datasets verify the effectiveness of GainRAG.

Yi Jiang, Sendong Zhao, Jianbo Li, Haochun Wang, Bing Qin• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	WebQ	EM16.5	27
Question Answering	PopQA	EM30.1	17
Question Answering	TriviaQA	EM50.3	17
Question Answering	QASPER Long-doc	R@146.3	16
Question Answering	NewsQA Short-doc	R@141.2	16
Biomedical Question Answering	BioASQ (test)	ROUGE49.3	8
Question Answering	HotpotQA	CoverEM38	4
Question Answering	PopQA	CoverEM44.4	4
Question Answering	Natural Questions (NQ)	CoverEM39.1	4
Question Answering	WebQuestions (WebQ)	CoverEM36.9	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord