Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Agentic Memory Via Deep Research

About

Memory is critical for AI agents, yet the widely-adopted static memory, aiming to create readily available memory in advance, is inevitably subject to severe information loss. To address this limitation, we propose a novel framework called \textbf{general agentic memory (GAM)}. GAM follows the principle of "\textbf{just-in time (JIT) compilation}" where it focuses on creating optimized contexts for its client at runtime while keeping only simple but useful memory during the offline stage. To this end, GAM employs a duo-design with the following components. 1) \textbf{Memorizer}, which highlights key historical information using a lightweight memory, while maintaining complete historical information within a universal page-store. 2) \textbf{Researcher}, which retrieves and integrates useful information from the page-store for its online request guided by the pre-constructed memory. This design allows GAM to effectively leverage the agentic capabilities and test-time scalability of frontier large language models (LLMs), while also facilitating end-to-end performance optimization through reinforcement learning. In our experimental study, we demonstrate that GAM achieves substantial improvement on various memory-grounded task completion scenarios against existing memory systems.

B.Y. Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, Zheng Liu• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score62.1
221
Multi-hop Question AnsweringLocomo
F141.17
67
Open-domain Question AnsweringLocomo
F10.341
53
Single-hop Question AnsweringLocomo
F10.5838
53
Long-context reasoning and retrievalLoCoMo (test)
Single-Hop F156.35
37
Temporal Question AnsweringLocomo
F10.5952
36
Multi-hop Question AnsweringHotpotQA 800 documents
F1 Score52.86
16
Multi-hop Question AnsweringHotpotQA 400 documents
F1 Score54.75
16
Multi-hop Question AnsweringHotpotQA 1600 documents
F153.71
16
Complex High-Precision ReasoningLoCoMo (test)
F1 Score45.31
8
Showing 10 of 10 rows

Other info

GitHub

Follow for update