Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Memp: Exploring Agent Procedural Memory

About

Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions, and explore the impact of different strategies for Build, Retrieval, and Update of procedural memory. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and ALFWorld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model can also yield substantial performance gains. Code is available at https://github.com/zjunlp/MemP.

Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Interactive Decision-makingAlfWorld
Overall Success Rate58.6
295
Multi-hop Question AnsweringHotpotQA--
294
Embodied TaskAlfWorld
Overall Success Rate74.3
169
Science Question AnsweringGPQA
pass@1 Accuracy74.75
85
Online ShoppingWebshop
Score25.3
61
Query AnsweringPersonaMem 32K context length
Query-Answering Accuracy52
60
Query AnsweringPersonaMem 128K context length
Query-Answering Accuracy0.4
60
Interactive web-based shopping tasksWebshop
Score25.3
60
Online ShoppingWebShop (test)
Score25.3
59
Web Shopping AgentWebshop
Score51.3
53
Showing 10 of 57 rows

Other info

Follow for update