Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AllMem: A Memory-centric Recipe for Efficient Long-context Modeling

About

Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient hybrid architecture that integrates Sliding Window Attention (SWA) with non-linear Test-Time Training (TTT) memory networks. \textsc{AllMem} enables models to effectively scale to ultra-long contexts while mitigating catastrophic forgetting. This approach not only overcomes the representation constraints typical of linear memory models but also significantly reduces the computational and memory footprint during long-sequence inference. Furthermore, we implement a Memory-Efficient Fine-Tuning strategy to replace standard attention layers in pre-trained models with memory-augmented sliding window layers. This framework facilitates the efficient transformation of any off-the-shelf pre-trained LLM into an \textsc{AllMem}-based architecture. Empirical evaluations confirm that our 4k window model achieves near-lossless performance on 37k LongBench with a marginal 0.83 drop compared to full attention. Furthermore, on InfiniteBench at a 128k context, our 8k window variant outperforms full attention, which validates the effectiveness of our parameterized memory in mitigating noise and maintaining robust long-range modeling without the prohibitive costs of global attention.

Ziming Wang, Xiang Wang, Kailong Peng, Lang Qin, Juan Gabriel Kostelec, Christos Sourmpis, Axel Laborieux, Qinghai Guo• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy53.9
776
Long-context Language UnderstandingLongBench
M-Avg32.12
219
ReasoningGPQA Diamond
Accuracy28.79
88
Long-context Question AnsweringLongBench (test)
HotpotQA28.23
59
KnowledgeARC Challenge
ARC-C Score74.4
31
KnowledgeARC Easy
ARC-E Score84.7
31
MathMATH 500
Accuracy74.4
25
CodingLiveCodeBench v5
Accuracy25
18
Long-context Question AnsweringLV-Eval--
14
General KnowledgeHellaSwag
Accuracy59.4
13
Showing 10 of 16 rows

Other info

Follow for update