Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization

About

We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation, which approximates the pruned blocks. The estimated linear mapping can be seamlessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25\% pruning while retaining approximately 90\% of the original model's performance on open benchmarks - without any training or healing steps, resulting in minimal computational overhead. We provide an open-source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at https://github.com/mts-ai/ReplaceMe

Dmitriy Shopkhoev, Ammar Ali, Magauiya Zhussip, Valentin Malykh, Stamatios Lefkimmiatis, Nikos Komodakis, Sergey Zagoruyko• 2025

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText2
Perplexity9.46
3785
Commonsense ReasoningHellaSwag
Accuracy65.7
1896
Language ModelingWikiText
PPL34
740
Commonsense ReasoningHellaSwag
HellaSwag Accuracy45.09
711
Question AnsweringARC-E
Accuracy65.9
523
Multitask Language UnderstandingMMLU
Accuracy46.4
520
Question AnsweringPIQA
Accuracy73.1
505
Physical Interaction Question AnsweringPIQA
Accuracy71.1
415
Language ModelingLAMBADA
Accuracy42.1
412
Multi-task Language UnderstandingMMLU
Accuracy51.7
353
Showing 10 of 44 rows

Other info

Follow for update