ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
About
Large language models (LLMs) present significant deployment challenges due to their immense computational and memory requirements. While semi-structured pruning, particularly 2:4 sparsity, offers a path to practical hardware acceleration, existing methods often incur substantial performance degradation. To bridge this gap, we introduce ARMOR: (Adaptive Representation with Matrix-factORization), a novel one-shot post-training pruning algorithm. Instead of directly pruning weights, ARMOR factorizes each weight matrix into a 2:4 sparse core wrapped by two low-overhead, block diagonal matrices. These wrappers act as efficient pre and post-transformation error correctors, offering greater flexibility to preserve model quality compared to conventional 2:4 pruning techniques. The sparse core and block diagonal wrappers are chosen through a block coordinate descent algorithm that minimizes a layer-wise proxy loss. We theoretically prove this optimization is guaranteed to converge to a solution with a proxy loss less than or equal to state-of-the-art pruning algorithms. Experiments on Llama (Touvron et al., 2023; Dubey et al., 2024) and Qwen (Yang et al., 2025) model families demonstrate that ARMOR consistently and significantly outperforms state-of-the-art 2:4 pruning methods across a wide range of downstream tasks and perplexity evaluations. ARMOR achieves this superior performance while retaining the inference speedups and substantial memory usage reductions of 2:4 pruning, establishing a more effective trade-off between model compression and task accuracy
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL4.8 | 1949 | |
| Commonsense Reasoning | WinoGrande | -- | 1085 | |
| Multitask Language Understanding | MMLU | Accuracy82.4 | 413 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Accuracy62.64 | 350 | |
| Logical reasoning | BBH | Accuracy68.28 | 201 | |
| Language Modeling | WikiText2 | Perplexity4.55 | 162 | |
| Scientific Reasoning | ARC Challenge | Accuracy63.4 | 94 | |
| commonsense inference | HellaSwag | Accuracy53.77 | 91 | |
| Language Understanding | MMLU | MMLU Score71.43 | 70 | |
| Science Question Answering | GPQA | Accuracy40.4 | 46 |