Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

About

Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance, making managing tool-use behavior challenging. In this work, we conduct entropy-based pilot experiments and observe a strong positive correlation between entropy reduction and high-quality tool calls. Building on this finding, we propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior. Sparse outcome rewards provide coarse, trajectory-level guidance to improve efficiency, while dense process rewards offer fine-grained supervision to enhance performance. Experiments across diverse domains show that both reward designs improve tool-use behavior: the former reduces tool calls by 72.07% compared to the average of baselines, while the latter improves performance by 22.27%. These results position entropy reduction as a key mechanism for enhancing tool-use behavior, enabling agents to be more adaptive in real-world applications.

Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li, Keyang Chen, Yixin Cao, Guangnan Ye, Hongfeng Chai, Zhenfei Yin• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 2024	Accuracy30.67	370
Mathematical Reasoning	AIME 2025	Accuracy30	227
Knowledge-intensive reasoning	MuSiQue	Accuracy33.4	31
Question Answering	2WikiMultihopQA	Accuracy61.7	25
Reasoning	HotpotQA	ACC164.6	25
Tool-using Reasoning	Reasoning Domain Suite (AIME2024, AIME2025, HotpotQA, 2WikiMultihopQA, Musique)	Average Accuracy42.39	13
Deep search	Average webw., hle, gaia	Accuracy9.87	7
Knowledge-intensive reasoning	Knowledge-intensive reasoning suite (HotpotQA, 2WikiMultihopQA, Musique)	HotpotQA Score43.6	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord