FrugalRAG: Less is More in RL Finetuning for Multi-Hop Question Answering

About

Reinforcement learning (RL) based on the final answer's reward has driven recent progress in small language models (SLMs) on reasoning-heavy tasks such as math and code. However, applying the same techniques to retrieval-augmented generation (RAG) benchmarks like multi-hop QA has yielded limited gains, often trailing supervised or prompting-only baselines. Instead, we argue that a viable path for RL in multi-hop QA is to use test-time scaling judiciously to optimize both final answer accuracy and efficiency in reaching that answer. We propose FrugalRAG, a two-stage finetuning framework that adaptively reduces the number of retrieval steps based on a question's difficulty. First, we train an SLM with supervised finetuning on a full-exploration policy that generates broad sub-queries. Then, we apply RL to adaptively prune search depth based on question difficulty, directly rewarding policies that balance correctness with frugality. Unlike prior approaches requiring 10x more data, our method achieves competitive performance with only approximately 1,000 examples. On HotPotQA and other multi-hop QA benchmarks, FrugalRAG attains state-of-the-art efficiency-accuracy tradeoffs, cutting retrieval cost nearly in half. Moreover, on the challenging BrowseCompPlus benchmark, it generalizes zero-shot and surpasses SLM-based and other baselines. These results demonstrate the use of RL not to increase reasoning steps, but to reduce them, as an effective solution for scalable and efficient RAG.

Abhinav Java, Srivathsan Koundinyan, Nagarajan Natarajan, Amit Sharma• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	--	559
Multi-hop Question Answering	HotpotQA fullwiki setting (dev)	--	38
Web-based Question Answering	BrowseComp+	Accuracy21.53	22
Multi-hop Question Answering	2Wiki	MBE51.2	17
Multi-hop Question Answering	HotpotQA	MBE58	17
Multi-hop Retrieval	HotpotQA	Latency (s)0.2415	15
Multi-hop Question Answering	MuSiQue	Recall52.6	6
Multi-hop Question Answering	2WikiMultiHopQA full-wiki (dev)	MBE47.6	6
Multi-hop Question Answering	MuSiQue full-wiki (dev)	MBE30.1	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord