Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

About

Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate steps. We propose \textsc{EvalAct} (Evaluate-as-Action), which converts implicit retrieval quality assessment into an explicit action and enforces a coupled Search-to-Evaluate protocol so that each retrieval is immediately followed by a structured evaluation score, yielding process signals aligned with the interaction trajectory. To leverage these signals, we introduce Process-Calibrated Advantage Rescaling (PCAR), a GRPO-based optimization method that rescales advantages at the segment level according to evaluation scores, emphasizing reliable segments while updating uncertain ones conservatively. Experiments on seven open-domain QA benchmarks show that \textsc{EvalAct} achieves the best average accuracy, with the largest gains on multi-hop tasks, and ablations verify that the explicit evaluation loop drives the primary improvements while PCAR provides consistent additional benefits.

Jiangming Shu, Yuxiang Zhang, Ye Ma, Xueyuan Lin, Jitao Sang• 2026

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2Wiki	Exact Match52.1	215
Single-hop Question Answering	PopQA	EM43.6	186
Multi-hop Question Answering	HotpotQA	Exact Match (EM)48.8	150
Single-hop Question Answering	TriviaQA	EM65.6	133
Multi-hop Question Answering	Bamboogle	Exact Match56	128
Single-hop Question Answering	NQ	Exact Match (EM)38.5	60
Multi-hop Question Answering	MuSiQue	Exact Match (EM)25.3	58

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord