Augmenting Language Models with Long-Term Memory
About
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dialogue Response Generation | MSC | B-4 Score33.3 | 38 | |
| Dialogue Response Generation | Chronicle | B-429.9 | 38 | |
| Response Generation | Chronicle and MSC Average | CEA47.3 | 30 | |
| Event Correlation Evaluation | Chronicle, MSC, and LoCoMo Average | CEA43.2 | 12 | |
| Dialogue Response Generation | Locomo | BLEU-425.3 | 8 | |
| Instruction Following with Long-term Memory | Human Evaluation 1-10 scale (test) | Coherence7.7 | 6 | |
| Generation and retrieval | MiSC multi-speaker | Coherence71 | 3 |