Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CARL: Focusing Agentic Reinforcement Learning on Critical Actions

About

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each action holds equal contribution, which deviates significantly from reality. Our analysis reveals that only a small fraction of actions are critical in determining the final outcome. Building on this insight, we propose CARL, a critical-action-focused reinforcement learning algorithm tailored for long-horizon agentic reasoning. CARL leverages entropy as a heuristic proxy for action criticality and achieves focused training by assigning rewards to high-criticality actions while excluding low-criticality actions from model updates, avoiding noisy credit assignment and redundant computation. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency across diverse evaluation settings. The source code will be publicly available.

Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMQA
F1 Score74
154
Multi-hop Question AnsweringMuSiQue--
106
Single-hop Question AnsweringTriviaQA--
62
Single-hop Question AnsweringPopQA--
55
Multi-hop Question AnsweringHotpotQA
F1 Score62.6
31
Multi-hop Question AnsweringBamboogle
F161.6
25
Question AnsweringKnowledge-Intensive Question Answering Benchmarks Aggregate
F159.2
15
Out-of-Distribution EvaluationGAIA (OOD)
Avg@432.5
3
Out-of-Distribution EvaluationFrames (OOD)
Avg@457.1
3
Out-of-Distribution EvaluationxBench-DS (OOD)
Avg@446
3
Showing 10 of 10 rows

Other info

Follow for update