Memory OS of AI Agent
About
Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents. To overcome this challenge, we innovatively propose a Memory Operating System, i.e., MemoryOS, to achieve comprehensive and efficient memory management for AI agents. Inspired by the memory management principles in operating systems, MemoryOS designs a hierarchical storage architecture and consists of four key modules: Memory Storage, Updating, Retrieval, and Generation. Specifically, the architecture comprises three levels of storage units: short-term memory, mid-term memory, and long-term personal memory. Key operations within MemoryOS include dynamic updates between storage units: short-term to mid-term updates follow a dialogue-chain-based FIFO principle, while mid-term to long-term updates use a segmented page organization strategy. Our pioneering MemoryOS enables hierarchical memory integration and dynamic updating. Extensive experiments on the LoCoMo benchmark show an average improvement of 49.11% on F1 and 46.18% on BLEU-1 over the baselines on GPT-4o-mini, showing contextual coherence and personalized memory retention in long conversations. The implementation code is open-sourced at https://github.com/BAI-LAB/MemoryOS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-term memory evaluation | Locomo | Overall F142.84 | 70 | |
| Multi-hop Question Answering | Locomo | F128.61 | 67 | |
| Long-context reasoning and retrieval | LoCoMo (test) | Single-Hop F148.62 | 37 | |
| Memory-Augmented Dialogue | PersonaMem v1.0 (test) | Overall Score59.97 | 28 | |
| Long-context Memory Evaluation | LongMemEval | Single-Turn Preference50 | 28 | |
| Long-term memory evaluation | LongMemEval S (test) | KU (Knowledge Update)60 | 27 | |
| MemoryBench Task | MemoryBench Short-Input-Long-Output 1.0 | Norm-Score74.62 | 24 | |
| MemoryBench Task | MemoryBench Short-Input-Short-Output | Norm-Score70.56 | 24 | |
| MemoryBench Task | MemoryBench Long-Input-Long-Output | Norm-Score45.96 | 24 | |
| Long-context Question Answering | Locomo | -- | 24 |