REMem: Reasoning with Episodic Memory in Language Agent

About

Humans excel at remembering concrete experiences along spatiotemporal contexts and performing reasoning across those events, i.e., the capacity for episodic memory. In contrast, memory in language agents remains mainly semantic, and current agents are not yet capable of effectively recollecting and reasoning over interaction histories. We identify and formalize the core challenges of episodic recollection and reasoning from this gap, and observe that existing work often overlooks episodicity, lacks explicit event modeling, or overemphasizes simple retrieval rather than complex reasoning. We present REMem, a two-phase framework for constructing and reasoning with episodic memory: 1) Offline indexing, where REMem converts experiences into a hybrid memory graph that flexibly links time-aware gists and facts. 2) Online inference, where REMem employs an agentic retriever with carefully curated tools for iterative retrieval over the memory graph. Comprehensive evaluation across four episodic memory benchmarks shows that REMem substantially outperforms state-of-the-art memory systems such as Mem0 and HippoRAG 2, showing 3.4% and 13.4% absolute improvements on episodic recollection and reasoning tasks, respectively. Moreover, REMem also demonstrates more robust refusal behavior for unanswerable questions.

Yiheng Shu, Saisri Padmaja Jonnalagedda, Xiang Gao, Bernal Jim\'enez Guti\'errez, Weijian Qi, Kamalika Das, Huan Sun, Yu Su• 2026

Related benchmarks

Task	Dataset	Result
Memory Agent Performance	MemoryAgentBench	Access Rate62.3	44
Episodic Reasoning	Complex-TR	F1 Score90.6	10
Episodic Recollection	Locomo	F1 Score (%)42.4	9
Episodic Reasoning	Test of Time 2,800	EM93.1	8
Episodic Recollection	REALTALK	F1 Score26.2	8
Question Answering	MuSiQue 1,000 samples (test)	F1 Score37.9	3
Refusal Prediction	LoCoMo full	Precision73.3	3
Question Answering	2Wiki 1,000 samples (test)	F1 Score0.386	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord