Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

About

Numerous large language model (LLM) agents have been built for different tasks like web navigation and online shopping due to LLM's wide knowledge and text-understanding ability. Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples. Recently, methods based on trajectory-level retrieval with task meta-data and using trajectories as in-context examples have been proposed to improve the agent's overall performance in some sequential decision making tasks. However, these methods can be problematic due to plausible examples retrieved without task-specific state transition dynamics and long input with plenty of irrelevant context. In this paper, we propose a novel framework (TRAD) to address these issues. TRAD first conducts Thought Retrieval, achieving step-level demonstration selection via thought matching, leading to more helpful demonstrations and less irrelevant input noise. Then, TRAD introduces Aligned Decision, complementing retrieved demonstration steps with their previous or subsequent steps, which enables tolerance for imperfect thought and provides a choice for balance between more context and less noise. Extensive experiments on ALFWorld and Mind2Web benchmarks show that TRAD not only outperforms state-of-the-art models but also effectively helps in reducing noise and promoting generalization. Furthermore, TRAD has been deployed in real-world scenarios of a global business insurance company and improves the success rate of robotic process automation.

Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang• 2024

Related benchmarks

TaskDatasetResultRank
GUI NavigationMind2Web (Cross-Website)
Element Accuracy36
23
Web Agent NavigationMIND2WEB Cross-Domain 1.0
Element Accuracy38.8
16
Web Agent NavigationMIND2WEB Cross-Task 1.0
Element Accuracy44.2
16
Web Agent NavigationMind2Web All 1.0
Element Accuracy0.396
16
Web Action Generation EfficiencyMind2Web Cross-Task
Time to Procedure3.24e+3
16
Web Action Generation EfficiencyMind2Web (Cross-Website)
To_Pro Steps/Time3.28e+3
16
Web Action Generation EfficiencyMind2Web Cross-Domain
To_Pro (Steps/Time)3.26e+3
16
Web Action Generation EfficiencyMind2Web (All)
Time to Proposal Steps3.28e+3
16
Household simulationALFWorld (out-of-distribution)
Put Success Rate12.5
12
Showing 9 of 9 rows

Other info

Follow for update