Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Memory OS of AI Agent

About

Large Language Models (LLMs) face a crucial challenge from fixed context windows and inadequate memory management, leading to a severe shortage of long-term memory capabilities and limited personalization in the interactive experience with AI agents. To overcome this challenge, we innovatively propose a Memory Operating System, i.e., MemoryOS, to achieve comprehensive and efficient memory management for AI agents. Inspired by the memory management principles in operating systems, MemoryOS designs a hierarchical storage architecture and consists of four key modules: Memory Storage, Updating, Retrieval, and Generation. Specifically, the architecture comprises three levels of storage units: short-term memory, mid-term memory, and long-term personal memory. Key operations within MemoryOS include dynamic updates between storage units: short-term to mid-term updates follow a dialogue-chain-based FIFO principle, while mid-term to long-term updates use a segmented page organization strategy. Our pioneering MemoryOS enables hierarchical memory integration and dynamic updating. Extensive experiments on the LoCoMo benchmark show an average improvement of 49.11% on F1 and 46.18% on BLEU-1 over the baselines on GPT-4o-mini, showing contextual coherence and personalized memory retention in long conversations. The implementation code is open-sourced at https://github.com/BAI-LAB/MemoryOS.

Jiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai• 2025

Related benchmarks

TaskDatasetResultRank
Long-term memory evaluationLocomo
Overall F142.84
119
Long-context Question AnsweringLocomo--
109
Multi-hop Question AnsweringLocomo
F128.61
67
Long-context Memory EvaluationLongMemEval
Average Score62.75
52
Temporal ReasoningLocomo
F1 Score41.15
45
Long-context reasoning and retrievalLoCoMo (test)
Single-Hop F148.62
37
Open DomainLocomo
F1 Score41.51
35
Memory-augmented language modeling evaluationLongMemEval-S
Accuracy49.6
31
Multi-hop ReasoningLocomo
F1 Score35.27
28
Multi-party Dialogue Question AnsweringLongDialQA
F1 Score10.61
28
Showing 10 of 55 rows

Other info

Follow for update