GradPruner: Gradient-Guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs

About

Fine-tuning Large Language Models (LLMs) with downstream data is often considered time-consuming and expensive. Structured pruning methods are primarily employed to improve the inference efficiency of pre-trained models. Meanwhile, they often require additional time and memory for training, knowledge distillation, structure search, and other strategies, making efficient model fine-tuning challenging to achieve. To simultaneously enhance the training and inference efficiency of downstream task fine-tuning, we introduce GradPruner, which can prune layers of LLMs guided by gradients in the early stages of fine-tuning. GradPruner uses the cumulative gradients of each parameter during the initial phase of fine-tuning to compute the Initial Gradient Information Accumulation Matrix (IGIA-Matrix) to assess the importance of layers and perform pruning. We sparsify the pruned layers based on the IGIA-Matrix and merge them with the remaining layers. Only elements with the same sign are merged to reduce interference from sign variations. We conducted extensive experiments on two LLMs across eight downstream datasets. Including medical, financial, and general benchmark tasks. The results demonstrate that GradPruner has achieved a parameter reduction of 40% with only a 0.99% decrease in accuracy. Our code is publicly available.

Wei Huang, Anda Cheng, Yinggui Wang• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy96.3	1896
Commonsense Reasoning	WinoGrande	Accuracy86.1	1442
Medical Question Answering	MedMCQA	Accuracy63.7	521
Physical Interaction Question Answering	PIQA	Accuracy89.7	415
Question Answering	ARC	Accuracy92.3	230
Question Answering	PubMedQA	Accuracy59.4	145
Financial NLP	FinGPT	Accuracy86.7	28
Summarization	BillSum	Accuracy68.7	28
Efficiency Evaluation	Model Efficiency Benchmarking Llama3.1-8B	Training Time62.4	11
Medical Knowledge Question Answering	MMLU Clinical Knowledge	Accuracy71.7	10

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord