Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of Long-Term Context Retention

About

Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic, and semantic layers with adaptive retrieval gating and retention regularization. The architecture controls cross-session drift while maintaining bounded context growth and computational efficiency. Experiments on LOCOMO, LOCCO, and LoCoMo show improved performance, achieving 46.85 Success Rate, 0.618 overall F1 with 0.594 multi-hop F1, and 56.90% six-period retention while reducing false memory rate to 5.1% and context usage to 58.40%. Results confirm enhanced long-term retention and reasoning stability under constrained context budgets.

Sunil Tiwari, Payal Fofadiya• 2026

Related benchmarks

Task	Dataset	Result
Long-horizon dialogue	LOCCO	SR99.1	5
Long-horizon dialogue	Locomo	Success Rate46.85	4
Information Retrieval	LongMemEval-S cross-benchmark replication Full-500	Precision@50.519	4
Long-context Question Answering	LongMemEval-S cross-benchmark replication Full-500	Jaccard Score (S-4.5, 3x)38.9	4
Long-term memory evaluation	LongMemEval-S Full-500 official protocol	Mean Accuracy35.4	4
Retrieval	LoCoMo real Retrieval post-G4	Precision@50.044	4
Structured Reasoning	Locomo	F1 Score61.8	2
Long-context Understanding	LongBench	--	1

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord