Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MemLong: Memory-Augmented Retrieval for Long Text Modeling

About

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval. MemLong combines a non-differentiable ``ret-mem'' module with a partially trainable decoder-only language model and introduces a fine-grained, controllable retrieval attention mechanism that leverages semantic-level relevant chunks. Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs. More importantly, MemLong can extend the context length on a single 3090 GPU from 4k up to 80k. Our code is available at https://github.com/Bui1dMySea/MemLong

Weijie Liu, Zecheng Tang, Juntao Li, Kehai Chen, Min Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-103 (test)
Perplexity7.938
703
Language ModelingPG-19
Perplexity9.64
206
Language ModelingPG-19 (test)
Perplexity9.858
112
Language ModelingProof-pile
Perplexity2.99
92
Language ModelingWikiText-103
Perplexity (PPL)7.87
43
Text ClassificationNLU Tasks (SST-2, MR, Subj, SST-5, MPQA)
SST-2 Accuracy93.5
13
Showing 6 of 6 rows

Other info

Follow for update