What Deserves Memory: Adaptive Memory Distillation for LLM Agents
About
Memory systems for LLM agents struggle to determine what information deserves retention. Existing approaches rely on predefined heuristics such as importance scores, emotional tags, or factual templates, encoding designer intuition rather than learning from the data itself. Inspired by cognitive ideas, we propose NEMORI, an adaptive memory distillation framework that casts the assessment of experience's future utility as a matter of predictability. Specifically, NEMORI comprises two cascading modules: Episodic Memory Integration transforms raw interactions into coherent narratives, and Semantic Knowledge Distillation extracts insights via prediction error. Centering on distillation, the framework remains agnostic to downstream management. Extensive experiments confirm that NEMORI achieves strong performance, efficiency, and storage reduction. Our work suggests that observing the intrinsic properties of interaction sequences offers a viable, data-driven alternative to heuristic-based memory design. Code: https://github.com/nemori-ai/nemori.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-term memory evaluation | Locomo | Overall F152.1 | 119 | |
| Long-context Question Answering | Locomo | F1 (Multi Hop)32.36 | 109 | |
| Long-context Memory Retrieval | Locomo | Single-hop84.9 | 70 | |
| Multi-hop Question Answering | Locomo | F144.2 | 67 | |
| Single-hop Question Answering | Locomo | F10.588 | 53 | |
| Open-domain Question Answering | Locomo | F10.258 | 53 | |
| Long-context Memory Evaluation | LongMemEval | Average Score74.6 | 52 | |
| Long-context reasoning and retrieval | LoCoMo (test) | Single-Hop F187.04 | 37 | |
| Temporal Question Answering | Locomo | F10.5838 | 36 | |
| Long-term memory evaluation | LongMemEval S (test) | KU (Knowledge Update)79.5 | 27 |