ConRAG: Consensus-Driven Multi-View Retrieval for Multi-Hop Question Answering

About

Retrieval-augmented generation (RAG) has emerged as a promising paradigm for enhancing large language models (LLMs) on multi-hop question answering (QA), which requires reasoning over evidence from multiple documents. Current multi-hop RAG methods generally focus on either query-side task decomposition or corpus-side knowledge graph construction. Despite their progress, these methods still struggle to achieve satisfactory performance on complex multi-hop QA tasks. To this end, we propose ConRAG, a consensus-driven multi-view RAG framework that effectively boosts LLMs on complex multi-hop QA. The core of ConRAG is to systematically optimize both the query and corpus sides and to leverage multi-view evidence (relation, entity, and text signals) for more accurate retrieval. Extensive experiments on three multi-hop QA benchmarks show that ConRAG consistently outperforms all baselines by a clear margin, e.g., up to +26.9% average performance gains over vanilla RAG, and enables Gemma-4-31B to achieve a new state-of-the-art record on the challenging MuSiQue benchmark.

Yikai Zhu, Kunfeng Chen, Qihuang Zhong, Juhua Liu, Bo Du• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	LLM Judge Score70.4	72
Multi-hop Question Answering	MuSiQue	String Accuracy48.4	44
Multi-hop Question Answering	2WikiMultihopQA	String Accuracy70.2	44
Multi-hop Question Answering	2WikiMultihopQA	Average Inference Time (s)5.64	13

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord