LightMem: Lightweight and Efficient Memory-Augmented Generation

About

Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. On LongMemEval and LoCoMo, using GPT and Qwen backbones, LightMem consistently surpasses strong baselines, improving QA accuracy by up to 7.7% / 29.3%, reducing total token usage by up to 38x / 20.9x and API calls by up to 30x / 55.5x, while purely online test-time costs are even lower, achieving up to 106x / 117x token reduction and 159x / 310x fewer API calls. The code is available at https://github.com/zjunlp/LightMem.

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, Ningyu Zhang• 2025

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	HotpotQA	F1 Score37.68	294
Long-context Question Answering	Locomo	F1 (Multi Hop)32.11	171
Long-term memory evaluation	Locomo	Overall F144.73	128
Multi-hop Question Answering	Locomo	F144.86	125
Question Answering	NarrativeQA	F1 Score36.86	124
Single-hop Question Answering	Locomo	F10.5588	111
Open-domain Question Answering	Locomo	F10.2619	111
Long-context Memory Evaluation	LongMemEval	Average Score72.8	103
Temporal Question Answering	Locomo	F10.6466	85
Long-context Memory Retrieval	Locomo	Single-hop76.61	80

Showing 10 of 96 rows

...

Other info

Follow for update

@wizwand_team Discord