Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

About

Large language models face challenges in long-context question answering, where key evidence of a query may be dispersed across millions of tokens. Existing works equip large language models with a memory buffer that is dynamically updated via a linear document scan, also known as the "memorize while reading" methods. While this approach scales efficiently, it suffers from pruning of latent evidence, information loss through overwriting, and sparse reinforcement learning signals. To tackle these challenges, we present ReMemR1, which integrates the mechanism of memory retrieval into the memory update process, enabling the agent to selectively callback historical memories for non-linear reasoning. To further strengthen training, we propose a multi-level reward design, which combines final-answer rewards with dense, step-level signals that guide effective memory use. Together, these contributions mitigate information degradation, improve supervision, and support complex multi-hop reasoning. Extensive experiments demonstrate that ReMemR1 significantly outperforms state-of-the-art baselines on long-context question answering while incurring negligible computational overhead, validating its ability to trade marginal cost for robust long-context reasoning.

Yaorui Shi, Yuxin Chen, Siyuan Wang, Sihang Li, Hengxing Cai, Qi Gu, Xiang Wang, An Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Long-context Question AnsweringHotpotQA In-Distribution
Accuracy82.8
72
Multi-hop Question Answering2WikiMultiHopQA Out-Of-Distribution (OOD)
Accuracy63.9
72
Long-context Question Answering2WikiMultiHopQA (Out-Of-Distribution)
Accuracy63.9
54
Question AnsweringHotpotQA In-Distribution
Accuracy (50 Docs)82.3
8
Showing 4 of 4 rows

Other info

Follow for update