ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents
About
Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval capabilities, they are hindered by two primary limitations: (1) rigid memory granularity often disrupts semantic integrity, resulting in fragmented and incoherent memory units; (2) prevalent flat retrieval paradigms rely solely on surface-level semantic similarity, neglecting the structural cues of discourse required to navigate and locate specific episodic contexts. To mitigate these limitations, drawing inspiration from Event Segmentation Theory, we propose ES-Mem, a framework incorporating two core components: (1) a dynamic event segmentation module that partitions long-term interactions into semantically coherent events with distinct boundaries; (2) a hierarchical memory architecture that constructs multi-layered memories and leverages boundary semantics to anchor specific episodic memory for precise context localization. Evaluations on two memory benchmarks demonstrate that ES-Mem yields consistent performance gains over baseline methods. Furthermore, the proposed event segmentation module exhibits robust applicability on dialogue segmentation datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-term memory evaluation | Locomo | Overall F145.56 | 70 | |
| Long-context Question Answering | Locomo | Average F145.56 | 64 | |
| Dialogue Memory Accuracy | LongMemEval-S (N=500) | Temporal Accuracy64.66 | 17 | |
| Dialogue Segmentation | DialSeg711 | Pk0.172 | 14 | |
| Dialogue Segmentation | SuperDialSeg | Pk0.434 | 10 | |
| Dialogue Segmentation | TIAGE | Pk0.382 | 10 |