Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

About

Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leaving query refinement untargeted. We present EviMem, combining IRIS (Iterative Retrieval via Insufficiency Signals), a closed-loop framework that detects evidence gaps through sufficiency evaluation, diagnoses what is missing, and drives targeted query refinement, with LaceMem (Layered Architecture for Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions at 4.5x lower latency. Code: https://github.com/AIGeeksGroup/EviMem.

Yuyang Li, Yime He, Zeyu Zhang, Dong Gong• 2026

Related benchmarks

TaskDatasetResultRank
Long-context Conversational Question AnsweringLoCoMo Overall
G-EVAL2.81
3
Long-context Conversational Question AnsweringLoCoMo Single-Hop
G-EVAL Score2.98
3
Long-context Conversational Question AnsweringLoCoMo Multi-Hop
G-EVAL2.89
3
Long-context Conversational Question AnsweringLoCoMo Temporal
G-EVAL3.08
3
Long-context Conversational Question AnsweringLoCoMo Adversarial
G-EVAL Score1.94
3
Long-context Conversational Question AnsweringLoCoMo Open-Domain
G-EVAL3.17
3
Showing 6 of 6 rows

Other info

Follow for update