Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval

About

Decomposition-based multi-hop retrieval methods rely on many autoregressive steps to break down complex queries, which breaks end-to-end differentiability and is computationally expensive. Decomposition-free methods tackle this, but current decomposition-free approaches struggle with longer multi-hop problems and generalization to out-of-distribution data. To address these challenges, we introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance on both in-distribution and out-of-distribution benchmarks. GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training. Through controlled studies, we find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance. By including elements such as final answers during training, the model learns to better contextualize and retrieve relevant information. GRITHopper-7B offers a robust, scalable, and generalizable solution for multi-hop dense retrieval, and we release it to the community for future research and applications requiring multi-hop reasoning and retrieval capabilities.

Justus-Jonas Erker, Nils Reimers, Iryna Gurevych• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA (test)--
311
Multi-hop Question AnsweringMuSiQue (test)--
128
Multi-hop QA Retrieval2WikiMultiHopQA (test)--
33
Multi-hop document retrievalHotpotQA (test)
Recall@K91.03
24
Multi-hop document retrievalMuSiQue (test)
Recall@K0.6048
24
Multi-hop RetrievalMoreHopQA (test)
Recall74.82
16
Multi-hop RetrievalAverage (HotpotQA, 2WikiMultihopQA, Musique, Morehopqa) (test)
Average Recall71.58
16
Multi-hop RetrievalHotpotQA
Latency (s)65.51
15
Multi-hop RetrievalMuSiQue
Latency (s/query)1.36
9
Multi-hop Question AnsweringMoreHopQA (test)
Accuracy48.66
9
Showing 10 of 11 rows

Other info

Follow for update