ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization

About

Large language models (LLMs) present significant deployment challenges due to their immense computational and memory requirements. While semi-structured pruning, particularly 2:4 sparsity, offers a path to practical hardware acceleration, existing methods often incur substantial performance degradation. To bridge this gap, we introduce ARMOR: (Adaptive Representation with Matrix-factORization), a novel one-shot post-training pruning algorithm. Instead of directly pruning weights, ARMOR factorizes each weight matrix into a 2:4 sparse core wrapped by two low-overhead, block diagonal matrices. These wrappers act as efficient pre and post-transformation error correctors, offering greater flexibility to preserve model quality compared to conventional 2:4 pruning techniques. The sparse core and block diagonal wrappers are chosen through a block coordinate descent algorithm that minimizes a layer-wise proxy loss. We theoretically prove this optimization is guaranteed to converge to a solution with a proxy loss less than or equal to state-of-the-art pruning algorithms. Experiments on Llama (Touvron et al., 2023; Dubey et al., 2024) and Qwen (Yang et al., 2025) model families demonstrate that ARMOR consistently and significantly outperforms state-of-the-art 2:4 pruning methods across a wide range of downstream tasks and perplexity evaluations. ARMOR achieves this superior performance while retaining the inference speedups and substantial memory usage reductions of 2:4 pruning, establishing a more effective trade-off between model compression and task accuracy

Lawrence Liu, Alexander Liu, Mengdi Wang, Tuo Zhao, Lin F. Yang• 2025

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2 (test)	PPL4.8	2333
Commonsense Reasoning	WinoGrande	--	1442
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy62.64	711
Multitask Language Understanding	MMLU	Accuracy82.4	520
Language Modeling	WikiText2	Perplexity4.55	277
Logical reasoning	BBH	Accuracy68.28	249
commonsense inference	HellaSwag	Accuracy53.77	123
Scientific Reasoning	ARC Challenge	Accuracy63.4	115
Language Understanding	MMLU	MMLU Score71.43	70
Science Question Answering	GPQA	Accuracy40.4	46

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord