MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

About

Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.

Gabriel Afriat, Xiang Meng, Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText2	Perplexity13.28	3785
Language Modeling	WikiText-2 (test)	PPL17.04	2333
Language Modeling	C4	Perplexity16.56	1565
Language Modeling	PTB	Perplexity22.1	1234
Language Modeling	PTB (test)	Perplexity29.21	543
Language Modeling	C4 (test)	Perplexity22.37	464
Question Answering	BoolQ	Accuracy63.98	317
Zero-shot Classification	Downstream Tasks Zero-shot (BoolQ, HellaSwag, WinoGrande, ARC-e, ARC-c, PIQA, OBQA)	BoolQ Accuracy76.13	87
Zero-shot Classification	Classification Suite Zero-shot	Average Accuracy (Zero-Shot Suite)47.85	51
Zero-shot Classification	7 Classification Tasks	Mean Performance45.7	7

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord