Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation

About

Retrieval-Augmented Generation (RAG) significantly enhances Large Language Models (LLMs) by providing access to external knowledge. However, current research primarily focuses on retrieval quality, often overlooking the critical ''integration bottleneck'': even when relevant documents are retrieved, LLMs frequently fail to utilize them effectively due to conflicts with their internal parametric knowledge. In this paper, we argue that implicitly resolving this conflict in a single generation pass is suboptimal. We introduce GuarantRAG, a framework that explicitly decouples reasoning from evidence integration. First, we generate an ''Inner-Answer'' based solely on parametric knowledge to capture the model's reasoning flow. Second, to guarantee faithful evidence extraction, we generate a ''Refer-Answer'' using a novel Contrastive DPO objective. This objective treats the parametric Inner-Answer as a negative constraint and the retrieved documents as positive ground truth, forcing the model to suppress internal hallucinations in favor of external evidence during this phase. Finally, rather than naive concatenation or using the DPO trained model directly, we propose a joint decoding mechanism that dynamically fuses the logical coherence of the Inner-Answer with the factual precision of the Refer-Answer at the token level. Experiments on five QA benchmarks demonstrate that GuarantRAG improves accuracy by up to 12.1% and reduces hallucinations by 16.3% compared to standard and dynamic RAG baselines.

Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Yuxi Zhang, Huimin Wang, Yutian Zhao, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu• 2026

Related benchmarks

Task	Dataset	Result
Long-form Question Answering	ELI5	--	57
Retrieval-Augmented Generation	All Datasets Aggregated	Average Performance Score76.6	55
Question Answering	TruthfulQA	Performance Score81.1	52
Question Answering	Average of 5 datasets	Average Score78.9	46
Knowledge Integration Quality	NQ, TruthfulQA, WoW, HotpotQA, ELI5 Aggregate	Average Performance76.7	32
Multi-hop Question Answering	HotpotQA	Performance Score78.2	32
Knowledge Grounded Dialogue	Wizard of Wikipedia (WoW)	Performance Score77.8	32
Question Answering	NQ	Average Performance Score85.7	20
Question Answering	WoW	Average Performance Score84.3	20
Question Answering	HotpotQA	Average Performance Score78.4	20

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord