Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory

About

AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based mechanisms. While efficient, such System-1-style retrieval struggles with scenarios that require global reasoning or comprehensive coverage of all relevant information. In this work, We propose Mnemis, a novel memory framework that integrates System-1 similarity search with a complementary System-2 mechanism, termed Global Selection. Mnemis organizes memory into a base graph for similarity retrieval and a hierarchical graph that enables top-down, deliberate traversal over semantic hierarchies. By combining the complementary strength from both retrieval routes, Mnemis retrieves memory items that are both semantically and structurally relevant. Mnemis achieves state-of-the-art performance across all compared methods on long-term memory benchmarks, scoring 93.9 on LoCoMo and 91.6 on LongMemEval-S using GPT-4.1-mini.

Zihao Tang, Xin Yu, Ziyu Xiao, Zengxuan Wen, Zelin Li, Jiaxi Zhou, Hualei Wang, Haohua Wang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang• 2026

Related benchmarks

Task	Dataset	Result
Long-context Memory Evaluation	LongMemEval	Average Score87.2	103
Long-context Memory Retrieval	Locomo	Single-hop97.1	80
Long-context Question Answering	Locomo	Single-Hop LLJ Score97.1	45
Long-term memory evaluation	LongMemEval S (test)	KU (Knowledge Update)93.6	30
Long-term memory evaluation	LongMemEvalS	Overall Score87.2	23
Long-term Memory Retrieval	LongMemEval-S	SSU98.6	19
Long-term memory evaluation	LongMemEval	Accuracy65.5	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord