Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

About

While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.

Ziliang Wang, Kang An, Xuhui Zheng, Faqiang Qian, Weikun Zhang, Cijun Ouyang, Jialu Cai, Yuhang Wang, Yichao Wu• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2Wiki	Exact Match57.7	215
Multi-hop Question Answering	HotpotQA offline Wiki-18 (test val)	EM44.7	24
Multi-hop Question Answering	2WikiMultiHopQA offline Wiki-18 (test val)	Exact Match43.6	24
Multi-hop Question Answering	MuSiQue offline Wiki-18 (test val)	EM24.4	24
Multi-hop Question Answering	Bamboogle offline Wiki-18 (test val)	Exact Match (EM)53.4	24
Multi-hop Question Answering	HotpotQA online Google Search API (test val)	Exact Match (EM)51.3	24
Multi-hop Question Answering	2WikiMultiHopQA online Google Search API (test val)	Exact Match63.5	24
Multi-hop Question Answering	MuSiQue online Google Search API (val test)	EM26.5	24
Multi-hop Question Answering	Bamboogle online Google Search API (test val)	Exact Match68.7	24
Multi-hop Question Answering	HotpotQA Wiki-18	Exact Match44.7	20

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord