Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ConRAG: Consensus-Driven Multi-View Retrieval for Multi-Hop Question Answering

About

Retrieval-augmented generation (RAG) has emerged as a promising paradigm for enhancing large language models (LLMs) on multi-hop question answering (QA), which requires reasoning over evidence from multiple documents. Current multi-hop RAG methods generally focus on either query-side task decomposition or corpus-side knowledge graph construction. Despite their progress, these methods still struggle to achieve satisfactory performance on complex multi-hop QA tasks. To this end, we propose ConRAG, a consensus-driven multi-view RAG framework that effectively boosts LLMs on complex multi-hop QA. The core of ConRAG is to systematically optimize both the query and corpus sides and to leverage multi-view evidence (relation, entity, and text signals) for more accurate retrieval. Extensive experiments on three multi-hop QA benchmarks show that ConRAG consistently outperforms all baselines by a clear margin, e.g., up to +26.9% average performance gains over vanilla RAG, and enables Gemma-4-31B to achieve a new state-of-the-art record on the challenging MuSiQue benchmark.

Yikai Zhu, Kunfeng Chen, Qihuang Zhong, Juhua Liu, Bo Du• 2026

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
LLM Judge Score70.4
72
Multi-hop Question AnsweringMuSiQue
String Accuracy48.4
44
Multi-hop Question Answering2WikiMultihopQA
String Accuracy70.2
44
Multi-hop Question Answering2WikiMultihopQA
Average Inference Time (s)5.64
13
Showing 4 of 4 rows

Other info

Follow for update