Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

About

Our ability to continuously acquire, organize, and leverage knowledge is a key feature of human intelligence that AI systems must approximate to unlock their full potential. Given the challenges in continual learning with large language models (LLMs), retrieval-augmented generation (RAG) has become the dominant way to introduce new information. However, its reliance on vector retrieval hinders its ability to mimic the dynamic and interconnected nature of human long-term memory. Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some of these gaps, namely sense-making and associativity. However, their performance on more basic factual memory tasks drops considerably below standard RAG. We address this unintended deterioration and propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks. HippoRAG 2 builds upon the Personalized PageRank algorithm used in HippoRAG and enhances it with deeper passage integration and more effective online use of an LLM. This combination pushes this RAG system closer to the effectiveness of human long-term memory, achieving a 7% improvement in associative memory tasks over the state-of-the-art embedding model while also exhibiting superior factual knowledge and sense-making memory capabilities. This work paves the way for non-parametric continual learning for LLMs. Code and data are available at https://github.com/OSU-NLP-Group/HippoRAG.

Bernal Jim\'enez Guti\'errez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su• 2025

Related benchmarks

TaskDatasetResultRank
Node ClassificationCora
Accuracy64.8
1215
Multi-hop Question Answering2WikiMultihopQA
EM75.4
387
Multi-hop Question AnsweringHotpotQA
F1 Score73.33
294
Multi-hop Question AnsweringHotpotQA (test)
F175.5
255
Multi-hop Question Answering2WikiMultiHopQA (test)
EM44.5
195
Node ClassificationREDDIT
Accuracy65
192
Question Answering2Wiki
F169.7
152
Multi-hop Question Answering2Wiki
Exact Match60.5
152
Question AnsweringHotpotQA
F171.1
128
Long-term memory evaluationLocomo
Overall F127.55
119
Showing 10 of 197 rows
...

Other info

Follow for update