AllMem: A Memory-centric Recipe for Efficient Long-context Modeling

About

Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient hybrid architecture that integrates Sliding Window Attention (SWA) with non-linear Test-Time Training (TTT) memory networks. \textsc{AllMem} enables models to effectively scale to ultra-long contexts while mitigating catastrophic forgetting. This approach not only overcomes the representation constraints typical of linear memory models but also significantly reduces the computational and memory footprint during long-sequence inference. Furthermore, we implement a Memory-Efficient Fine-Tuning strategy to replace standard attention layers in pre-trained models with memory-augmented sliding window layers. This framework facilitates the efficient transformation of any off-the-shelf pre-trained LLM into an \textsc{AllMem}-based architecture. Empirical evaluations confirm that our 4k window model achieves near-lossless performance on 37k LongBench with a marginal 0.83 drop compared to full attention. Furthermore, on InfiniteBench at a 128k context, our 8k window variant outperforms full attention, which validates the effectiveness of our parameterized memory in mitigating noise and maintaining robust long-range modeling without the prohibitive costs of global attention.

Ziming Wang, Xiang Wang, Kailong Peng, Lang Qin, Juan Gabriel Kostelec, Christos Sourmpis, Axel Laborieux, Qinghai Guo• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy53.9	1442
Long-context Language Understanding	LongBench	M-Avg32.12	294
Reasoning	GPQA Diamond	Accuracy28.79	185
Math	MATH 500	Accuracy74.4	120
Long-context Question Answering	LongBench (test)	HotpotQA28.23	69
General Knowledge	HellaSwag	Accuracy59.4	36
Knowledge	ARC Challenge	ARC-C Score74.4	31
Knowledge	ARC Easy	ARC-E Score84.7	31
Coding	LiveCodeBench v5	Accuracy25	29
Knowledge	C-Eval	C-Eval Knowledge Accuracy0.589	18

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord