Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AllMem: A Memory-centric Recipe for Efficient Long-context Modeling

About

Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient hybrid architecture that integrates Sliding Window Attention (SWA) with non-linear Test-Time Training (TTT) memory networks. \textsc{AllMem} enables models to effectively scale to ultra-long contexts while mitigating catastrophic forgetting. This approach not only overcomes the representation constraints typical of linear memory models but also significantly reduces the computational and memory footprint during long-sequence inference. Furthermore, we implement a Memory-Efficient Fine-Tuning strategy to replace standard attention layers in pre-trained models with memory-augmented sliding window layers. This framework facilitates the efficient transformation of any off-the-shelf pre-trained LLM into an \textsc{AllMem}-based architecture. Empirical evaluations confirm that our 4k window model achieves near-lossless performance on 37k LongBench with a marginal 0.83 drop compared to full attention. Furthermore, on InfiniteBench at a 128k context, our 8k window variant outperforms full attention, which validates the effectiveness of our parameterized memory in mitigating noise and maintaining robust long-range modeling without the prohibitive costs of global attention.

Ziming Wang, Xiang Wang, Kailong Peng, Lang Qin, Juan Gabriel Kostelec, Christos Sourmpis, Axel Laborieux, Qinghai Guo• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande
Accuracy53.9
1085
Long-context Language UnderstandingLongBench
M-Avg32.12
292
ReasoningGPQA Diamond
Accuracy28.79
135
MathMATH 500
Accuracy74.4
86
Long-context Question AnsweringLongBench (test)
HotpotQA28.23
69
KnowledgeARC Challenge
ARC-C Score74.4
31
KnowledgeARC Easy
ARC-E Score84.7
31
CodingLiveCodeBench v5
Accuracy25
29
General KnowledgeHellaSwag
Accuracy59.4
27
Long-context Question AnsweringLV-Eval--
14
Showing 10 of 16 rows

Other info

Follow for update