MIDUS: Memory-Infused Depth Up-Scaling

About

Expanding pre-trained language models offers a practical way to increase capacity without training larger models from scratch. Depth Up-Scaling (DUS) does so by duplicating Transformer blocks and inserting them into a pre-trained backbone. This process also duplicates FFN-heavy blocks, increasing parameter and compute cost while adding capacity through a block-level dense residual branch. Yet prior work suggests that added capacity need not remain tied to dense FFN branches, while attention heads often play heterogeneous roles, motivating more efficient head-level residual corrections. We propose Memory-Infused Depth Up-Scaling (MIDUS), which replaces the duplicated FFN branches with memory layers and turns added depth into lightweight retrieval-based residual capacity. We introduce a Head-wise Memory Layer (HML), which combines multi-head product-key memory with Head-wise Implicit Value Expansion (HIVE). HML assigns each head a distinct key space, while HIVE realizes head-specific values from a shared latent bank through compact projections. Alongside empirical improvements in performance and efficiency, our head-importance and fixed-retrieval structural analyses characterize HML with HIVE as a structurally distinct, head-conditioned alternative to FFN-based residual expansion.

Taero Kim, Hoyoon Byun, Youngjun Choi, Sungrae Park, Kyungwoo Song• 2025

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy37.82	881
Language Modeling	WikiText-103 (test)	Perplexity7.4	703
Commonsense Reasoning	WinoGrande	Accuracy61.56	453
Boolean Question Answering	BoolQ	Accuracy66.21	350
Question Answering	ARC	Accuracy66.33	230
Logical reasoning	LogiQA	Accuracy23.2	100
Physical Reasoning	PIQA	Accuracy75.9	90
Commonsense Question Answering	CSQA	Accuracy50.04	61
Language Modeling	Wikipedia	Perplexity11.64	43
Zero-shot Question Answering and Reasoning	Evaluation Suite Zero-shot (ARC, LogiQA, Wino, CSQA, BoolQ, PIQA, MMLU)	ARC83.5	21

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord