Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BAGEN: Are LLM Agents Budget-Aware?

About

While agents are increasingly spending more resources, today agent cost is mostly measured only after execution. A Budget-Aware Agent (BAGEN) should treat budget as an active control signal, rather than a passive cost metric. We first systematically define budget estimation as internal budgets (from agent computation) and external budgets (from agent actions). We then formalize budget-awareness as progressive interval estimation: at each step of a plan, an agent should predict an upper and lower bound on remaining budget, and alert when completion is unlikely. Scoring with a rollout-replay protocol, we find consistent failure patterns on four environments and five frontier agents: (1) strong agents do not necessarily have strong budget-awareness, with correlation r=0.35. (2) frontier models are consistently over-optimistic, continue spending on tasks that are unlikely to succeed, instead of alerting the user early. (3) budget-aware signal is actionable and trainable. Early stop saves 28-64% tokens on failed trajectories, and SFT+RL strengthens early stop and alert behavior. (4) precise interval calibration remains challenging, with interval coverage capping at 47% after SFT+RL. Project page: https://ragen-ai.github.io/bagen/

Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan, Longju Bai, Junyao Zhang, Xing Jin, Boshan Chen, Jinyan Su, Xingyao Wang, Jiaxin Pei, Manling Li• 2026

Related benchmarks

TaskDatasetResultRank
Early-stopping budget estimationSearch-R1, Sokoban, SWE-bench, and Warehouse Aggregate (test)--
5
Feasibility PredictionSWE-Bench--
5
Feasibility PredictionSearch-R1--
5
Feasibility PredictionSokoban--
5
Feasibility PredictionWarehouse--
5
Interval QualitySWE-Bench--
5
Interval QualitySearch-R1--
5
Interval QualitySokoban--
5
Interval QualityWarehouse--
5
Task PerformanceSWE-Bench--
5
Showing 10 of 13 rows

Other info

Follow for update