Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

About

Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.

Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang• 2025

Related benchmarks

Task	Dataset	Result
Question Answering	HotpotQA In-Distribution	F1 Score10	23
Question Answering	2Wiki (In-Distribution)	Accuracy65	14
General AI Assistant Tasks	GAIA Out-of-Distribution	Accuracy35	14
Information Extraction	xbench-deepsearch Out-of-Distribution	Accuracy35	14
Question Answering	MuSiQue in-distribution	Accuracy37	14
Question Answering	Overall (Average)	Accuracy48	14

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord