Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

About

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations. Building on this foundation, we further propose an enhanced variant that leverages graph-based memory representations to capture complex relational structures among conversational elements. Through comprehensive evaluations on LOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) established memory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-source memory solution, (v) a proprietary model system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories: single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26% relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular, Mem0 attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advanced reasoning capabilities and practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav• 2025

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA
F1 Score30.13
221
Question AnsweringMuSiQue
EM23.33
84
Long-term memory evaluationLocomo
Overall F145.09
70
Multi-hop Question AnsweringLocomo
F142.57
67
Long-context Question AnsweringLocomo
Average F145.09
64
Question AnsweringNarrativeQA (test)
ROUGE-L5.23
61
Long-context Memory RetrievalLocomo
Single-hop73.33
55
Open-domain Question AnsweringLocomo
F10.2864
53
Single-hop Question AnsweringLocomo
F10.4849
53
Interactive Decision-makingAlfWorld
PICK54
52
Showing 10 of 108 rows
...

Other info

Follow for update