Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

About

Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory buffer that is dynamically updated via a linear document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from pruning of latent evidence, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, which integrates the mechanism of memory retrieval into the memory update process, enabling the agent to selectively callback historical memories for non-linear reasoning. To further strengthen training, we propose a multi-level reward design, which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support complex multi-hop reasoning. Extensive experiments demonstrate that ReMemR1 significantly outperforms state-of-the-art baselines on long-context question answering while incurring negligible computational overhead, validating its ability to trade marginal cost for robust long-context reasoning.

Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu, Xiang Wang, An Zhang• 2025

Related benchmarks

Task	Dataset	Result
Long-context Question Answering	HotpotQA In-Distribution	Accuracy82.8	72
Multi-hop Question Answering	2WikiMultiHopQA Out-Of-Distribution (OOD)	Accuracy63.9	72
Long-context Question Answering	2WikiMultiHopQA (Out-Of-Distribution)	Accuracy63.9	54
Question Answering	RULER QA	Accuracy71.9	24
Question Answering	HotpotQA In-Distribution	--	23
Multihop Question Answering	HotpotQA In-Distribution	Accuracy (8K Context)58.6	6
Multihop Question Answering	2WikiMultiHopQA (Out-Of-Distribution)	Accuracy (8K Context)64.1	6
Question Answering	LongBench-E-QA v1 (test)	Accuracy (0-4k Context)55.7	3
Long-context Question Answering	LongBench 1M v2	Easy Difficulty Accuracy26	3
Question Answering	LongBench QA v1 (test)	2wikimqa63.5	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord