Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MIDUS: Memory-Infused Depth Up-Scaling

About

Expanding pre-trained language models offers a practical way to increase capacity without training larger models from scratch. Depth Up-Scaling (DUS) does so by duplicating Transformer blocks and inserting them into a pre-trained backbone. This process also duplicates FFN-heavy blocks, increasing parameter and compute cost while adding capacity through a block-level dense residual branch. Yet prior work suggests that added capacity need not remain tied to dense FFN branches, while attention heads often play heterogeneous roles, motivating more efficient head-level residual corrections. We propose Memory-Infused Depth Up-Scaling (MIDUS), which replaces the duplicated FFN branches with memory layers and turns added depth into lightweight retrieval-based residual capacity. We introduce a Head-wise Memory Layer (HML), which combines multi-head product-key memory with Head-wise Implicit Value Expansion (HIVE). HML assigns each head a distinct key space, while HIVE realizes head-specific values from a shared latent bank through compact projections. Alongside empirical improvements in performance and efficiency, our head-importance and fixed-retrieval structural analyses characterize HML with HIVE as a structurally distinct, head-conditioned alternative to FFN-based residual expansion.

Taero Kim, Hoyoon Byun, Youngjun Choi, Sungrae Park, Kyungwoo Song• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU
Accuracy37.82
881
Language ModelingWikiText-103 (test)
Perplexity7.4
703
Commonsense ReasoningWinoGrande
Accuracy61.56
453
Boolean Question AnsweringBoolQ
Accuracy66.21
350
Question AnsweringARC
Accuracy66.33
230
Logical reasoningLogiQA
Accuracy23.2
100
Physical ReasoningPIQA
Accuracy75.9
90
Commonsense Question AnsweringCSQA
Accuracy50.04
61
Language ModelingWikipedia
Perplexity11.64
43
Zero-shot Question Answering and ReasoningEvaluation Suite Zero-shot (ARC, LogiQA, Wino, CSQA, BoolQ, PIQA, MMLU)
ARC83.5
21
Showing 10 of 12 rows

Other info

Follow for update