Adaptive Information Control for Search-Augmented LLM Reasoning

About

Search-augmented reasoning agents interleave multi-step reasoning with external retrieval, but uncontrolled retrieval can introduce redundant evidence, saturate the context, and destabilize reinforcement learning (RL). Existing outcome-based RL methods provide only sparse terminal rewards, offering limited guidance for intermediate information-acquisition decisions. We propose DeepControl, an adaptive information-control framework based on information utility, a state-dependent estimate of the marginal value of retrieved evidence. The framework regulates information acquisition along two axes: extent, i.e., whether retrieval should continue, and resolution, i.e., how much retrieved detail should be exposed. It implements these controls through retrieval-continuation guidance, hierarchical granularity control, and an annealed control-forcing scheme. This enables the policy to internalize effective acquisition behavior during training and operate without external control at test time. Across seven benchmarks, DeepControl consistently outperforms strong RL and retrieval baselines without explicit information control; compared with Search-R1, it improves average performance by +9.4 and +8.6 points on Qwen2.5-7B and Qwen2.5-3B, respectively. Additional analyses show improved search effectiveness, training stability, and evidence utilization.

Siheng Xiong, Oguzhan Gungordu, James C. Kerce, Faramarz Fekri• 2026

Related benchmarks

Task	Dataset	Result
General Question Answering	Natural Questions (NQ) (test val)	EM55.8	24
General Question Answering	TriviaQA (test val)	EM68.2	24
Multi-hop Question Answering	HotpotQA (test val)	Exact Match (EM)47.1	18
Multi-hop Question Answering	2WikiMultiHopQA (test val)	Exact Match (EM)43.9	16
General Question Answering	PopQA (test val)	Exact Match (EM)52.1	4
Multi-hop Question Answering	MuSiQue (test val)	Exact Match (EM)22.1	2
Multi-hop Question Answering	Bamboogle (test val)	Exact Match (EM)45.8	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord