HiMem: Hierarchical Long-Term Memory for LLM Long-Horizon Agents
About
Although long-term memory systems have made substantial progress in recent years, they still exhibit clear limitations in adaptability, scalability, and self-evolution under continuous interaction settings. Inspired by cognitive theories, we propose HiMem, a hierarchical long-term memory framework for long-horizon dialogues, designed to support memory construction, retrieval, and dynamic updating during sustained interactions. HiMem constructs cognitively consistent Episode Memory via a Topic-Aware Event--Surprise Dual-Channel Segmentation strategy, and builds Note Memory that captures stable knowledge through a multi-stage information extraction pipeline. These two memory types are semantically linked to form a hierarchical structure that bridges concrete interaction events and abstract knowledge, enabling efficient retrieval without sacrificing information fidelity. HiMem supports both hybrid and best-effort retrieval strategies to balance accuracy and efficiency, and incorporates conflict-aware Memory Reconsolidation to revise and supplement stored knowledge based on retrieval feedback. This design enables continual memory self-evolution over long-term use. Experimental results on long-horizon dialogue benchmarks demonstrate that HiMem consistently outperforms representative baselines in accuracy, consistency, and long-term reasoning, while maintaining favorable efficiency. Overall, HiMem provides a principled and scalable design paradigm for building adaptive and self-evolving LLM-based conversational agents. The code is available at https://github.com/jojopdq/HiMem.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | Locomo | F133.8 | 125 | |
| Single-hop Question Answering | Locomo | F10.491 | 111 | |
| Open-domain Question Answering | Locomo | F10.219 | 111 | |
| Temporal Question Answering | Locomo | F10.468 | 85 | |
| Role-playing Quality Evaluation | RoleMemo (test) | Information Richness4.05 | 14 | |
| Memory Construction Quality | RoleMemo 1.0 (test) | Interpretive Attribution Fact (Recall@10)37 | 10 | |
| Memory Management | Heavy admitted-content | Recall100 | 6 | |
| Memory Management and Retrieval | Heavy admitted-content (full matrix) | Recall100 | 6 |