$\delta$-mem: Efficient Online Memory for Large Language Models

About

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose $\delta$-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. $\delta$-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an $8\times8$ online memory state, $\delta$-mem improves the average score to $1.10\times$ that of the frozen backbone and $1.15\times$ that of the strongest non-$\delta$-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching $1.31\times$ on MemoryAgentBench and $1.20\times$ on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.

Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, Soujanya Poria• 2026

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	IFEval Accuracy82.99	854
Long-context Reasoning	Locomo	F1 (Multi Hop)46.46	78
Memory Agent Performance	MemoryAgentBench	Access Rate45.65	44
General Evaluation	Aggregate Benchmarks	Average Score51.66	37
Reasoning	GPQA D	GPQA-D Score49.49	22

Showing 5 of 5 rows

Other info

GitHub

Follow for update

@wizwand_team Discord