BAGEN: Are LLM Agents Budget-Aware?

About

While agents are increasingly spending more resources, today agent cost is mostly measured only after execution. A Budget-Aware Agent (BAGEN) should treat budget as an active control signal, rather than a passive cost metric. We first systematically define budget estimation as internal budgets (from agent computation) and external budgets (from agent actions). We then formalize budget-awareness as progressive interval estimation: at each step of a plan, an agent should predict an upper and lower bound on remaining budget, and alert when completion is unlikely. Scoring with a rollout-replay protocol, we find consistent failure patterns on four environments and five frontier agents: (1) strong agents do not necessarily have strong budget-awareness, with correlation r=0.35. (2) frontier models are consistently over-optimistic, continue spending on tasks that are unlikely to succeed, instead of alerting the user early. (3) budget-aware signal is actionable and trainable. Early stop saves 28-64% tokens on failed trajectories, and SFT+RL strengthens early stop and alert behavior. (4) precise interval calibration remains challenging, with interval coverage capping at 47% after SFT+RL. Project page: https://ragen-ai.github.io/bagen/

Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan, Longju Bai, Junyao Zhang, Xing Jin, Boshan Chen, Jinyan Su, Xingyao Wang, Jiaxin Pei, Manling Li• 2026

Related benchmarks

Task	Dataset	Result
Early-stopping budget estimation	Search-R1, Sokoban, SWE-bench, and Warehouse Aggregate (test)	--	5
Feasibility Prediction	SWE-Bench	--	5
Feasibility Prediction	Search-R1	--	5
Feasibility Prediction	Sokoban	--	5
Feasibility Prediction	Warehouse	--	5
Interval Quality	SWE-Bench	--	5
Interval Quality	Search-R1	--	5
Interval Quality	Sokoban	--	5
Interval Quality	Warehouse	--	5
Task Performance	SWE-Bench	--	5

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord