SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

About

Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing methods often accumulate irrelevant or noisy documents and rely on sparse reinforcement learning signals. We propose \textbf{S}elf-\textbf{E}volving \textbf{Search}, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards. SE-Search follows a \textit{Think-Search-Memorize} strategy that retains salient evidence while filtering irrelevant content. Atomic query training promotes shorter and more diverse queries, improving evidence acquisition. Dense rewards provide fine-grained feedback that speeds training. Experiments on single-hop and multi-hop question answering benchmarks show that \texttt{SE-Search-3B} outperforms strong baselines, yielding a $10.8$ point absolute improvement and a $33.8\%$ relative gain over Search-R1.\footnote{We will make the code and model weights publicly available upon acceptance.}

Jian Li, Yizhang Jin, Dongqi Liu, Hang Ding, Jiafu Wu, Dongsheng Chen, Yunhang Shen, Yulei Qin, Ying Tai, Chengjie Wang, Xiaotong Yuan, Yabiao Wang• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA (test)	--	311
Multi-hop Question Answering	2WikiMultiHopQA (test)	EM36.1	226
Question Answering	PopQA	--	186
Question Answering	HotpotQA	EM45	173
Question Answering	TriviaQA	--	117
Question Answering	2WikiMultihopQA	EM36.1	107
Multi-hop Question Answering	Bamboogle (test)	EM42.4	98
Question Answering	MuSiQue	EM18.3	71
Question Answering	Bamboogle	EM Accuracy (%)42.4	68
Single-hop Question Answering	TriviaQA (test)	Accuracy62.4	50

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord