CARL: Criticality-Aware Agentic Reinforcement Learning

About

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each step holds equal contribution, which deviates significantly from reality. Our analysis reveals that only the action choices on a small fraction of states are critical in determining the final outcome. Building on this insight, we propose CARL, a criticality-aware reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for state criticality and achieves focused training by assigning rewards to actions taken from high-criticality states while excluding actions taken from low-criticality states from model updates, avoiding noisy credit assignment and redundant computation. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency across diverse evaluation settings. The source code will be publicly available.

Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	MuSiQue	--	209
Single-hop Question Answering	PopQA	--	186
Multi-hop Question Answering	2WikiMQA	F1 Score74	161
Single-hop Question Answering	TriviaQA	--	133
Multi-hop Question Answering	HotpotQA	F1 Score62.6	31
Multi-hop Question Answering	Bamboogle	F161.6	25
Question Answering	Knowledge-Intensive Question Answering Benchmarks Aggregate	F159.2	15
Out-of-Distribution Evaluation	GAIA (OOD)	Avg@432.5	3
Out-of-Distribution Evaluation	Frames (OOD)	Avg@457.1	3
Out-of-Distribution Evaluation	xBench-DS (OOD)	Avg@446	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord