Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation

About

Retrieval-Augmented Generation (RAG), by integrating non-parametric knowledge from external knowledge bases into models, has emerged as a promising approach to enhancing response accuracy while mitigating factual errors and hallucinations. This method has been widely applied in tasks such as Question Answering (QA). However, existing RAG methods struggle with open-domain QA tasks because they perform independent retrieval operations and directly incorporate the retrieved information into generation without maintaining a summarizing memory or using adaptive retrieval strategies, leading to noise from redundant information and insufficient information integration. To address these challenges, we propose Adaptive memory-based optimization for enhanced RAG (Amber) for open-domain QA tasks, which comprises an Agent-based Memory Updater, an Adaptive Information Collector, and a Multi-granular Content Filter, working together within an iterative memory updating paradigm. Specifically, Amber integrates and optimizes the language model's memory through a multi-agent collaborative approach, ensuring comprehensive knowledge integration from previous retrieval steps. It dynamically adjusts retrieval queries and decides when to stop retrieval based on the accumulated knowledge, enhancing retrieval efficiency and effectiveness. Additionally, it reduces noise by filtering irrelevant content at multiple levels, retaining essential information to improve overall model performance. We conduct extensive experiments on several open-domain QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components. The source code is available \footnote{https://anonymous.4open.science/r/Amber-B203/}.

Qitao Qin, Yucong Luo, Yihang Lu, Zhibo Chu, Xiaoman Liu, Xianwei Meng• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score53.72	294
Multi-hop Question Answering	2WikiMQA	F1 Score52.73	175
Single-hop Question Answering	TriviaQA	--	147
Long-context Question Answering	LongBench v2	Overall Accuracy35.85	33
Single-hop Question Answering	SQuAD	F1 Score39.37	21
Long-form Question Answering	ASQA	str-em51.3	19
Single-hop Question Answering	Natural Questions	Accuracy47.8	15
Multi-document Question Answering	MultiDocQA	HotpotQA Accuracy55.59	14
Single-document Question Answering	Singledoc QA	DuReader Score20.57	14
Question Answering	LongBookQA en	F1 Score19.58	5

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord