Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Workflow Memory

About

Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig• 2024

Related benchmarks

TaskDatasetResultRank
Web navigation and task completionWebArena (test)
Average Task Completion35.5
42
Agentic task solvingAppWorld
TGC88
28
Web navigationWebArena
Overall Avg Success Rate39.37
23
Interactive Instruction FollowingALFWorld (train)
Success Rate70
9
Interactive Instruction FollowingALFWorld OOD
Success Rate90
9
Strategic game playingMastermind Extreme
Average Return0.294
9
Strategic game playingMastermind Hard
Average Return0.299
9
Text-based Game PlayingJericho Zork1 (test)
Average Score38.8
7
Text-based Game PlayingJericho Zork3 (test)
Avg Score1.9
7
Text-based Game PlayingJericho Library (test)
Average Score12.7
7
Showing 10 of 15 rows

Other info

Follow for update