Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

About

Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance, making managing tool-use behavior challenging. In this work, we conduct entropy-based pilot experiments and observe a strong positive correlation between entropy reduction and high-quality tool calls. Building on this finding, we propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior. Sparse outcome rewards provide coarse, trajectory-level guidance to improve efficiency, while dense process rewards offer fine-grained supervision to enhance performance. Experiments across diverse domains show that both reward designs improve tool-use behavior: the former reduces tool calls by 72.07% compared to the average of baselines, while the latter improves performance by 22.27%. These results position entropy reduction as a key mechanism for enhancing tool-use behavior, enabling agents to be more adaptive in real-world applications.

Zeping Li, Hongru Wang, Yiwen Zhao, Guanhua Chen, Yixia Li, Keyang Chen, Yixin Cao, Guangnan Ye, Hongfeng Chai, Zhenfei Yin• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy30.67
251
Mathematical ReasoningAIME 2025
Accuracy30
227
Knowledge-intensive reasoningMuSiQue
Accuracy33.4
31
Question Answering2WikiMultihopQA
Accuracy61.7
25
ReasoningHotpotQA
ACC164.6
25
Tool-using ReasoningReasoning Domain Suite (AIME2024, AIME2025, HotpotQA, 2WikiMultihopQA, Musique)
Average Accuracy42.39
13
Deep searchAverage webw., hle, gaia
Accuracy9.87
7
Knowledge-intensive reasoningKnowledge-intensive reasoning suite (HotpotQA, 2WikiMultihopQA, Musique)
HotpotQA Score43.6
6
Showing 8 of 8 rows

Other info

Follow for update