General Agentic Memory Via Deep Research
About
Memory is critical for AI agents, yet the widely-adopted static memory, aiming to create readily available memory in advance, is inevitably subject to severe information loss. To address this limitation, we propose a novel framework called \textbf{general agentic memory (GAM)}. GAM follows the principle of "\textbf{just-in time (JIT) compilation}" where it focuses on creating optimized contexts for its client at runtime while keeping only simple but useful memory during the offline stage. To this end, GAM employs a duo-design with the following components. 1) \textbf{Memorizer}, which highlights key historical information using a lightweight memory, while maintaining complete historical information within a universal page-store. 2) \textbf{Researcher}, which retrieves and integrates useful information from the page-store for its online request guided by the pre-constructed memory. This design allows GAM to effectively leverage the agentic capabilities and test-time scalability of frontier large language models (LLMs), while also facilitating end-to-end performance optimization through reinforcement learning. In our experimental study, we demonstrate that GAM achieves substantial improvement on various memory-grounded task completion scenarios against existing memory systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | HotpotQA | F1 Score62.1 | 294 | |
| Long-term memory evaluation | Locomo | -- | 128 | |
| Multi-hop Question Answering | Locomo | F141.17 | 125 | |
| Question Answering | NarrativeQA | F1 Score27.07 | 124 | |
| Open-domain Question Answering | Locomo | F10.341 | 111 | |
| Single-hop Question Answering | Locomo | F10.5838 | 111 | |
| Long-context Memory Evaluation | LongMemEval | Average Score76.2 | 103 | |
| Temporal Question Answering | Locomo | F10.5952 | 85 | |
| Multi-hop Reasoning | Locomo | F1 Score38.09 | 68 | |
| Open Domain | Locomo | F1 Score23.82 | 51 |