Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

About

Recent studies have shown that supervised fine-tuning of LLMs on a small number of high-quality datasets can yield strong reasoning capabilities. However, full fine-tuning (Full FT), while powerful, is computationally expensive and susceptible to overfitting and catastrophic forgetting, particularly when data is limited. Sparse fine-tuning, which previously achieved notable success by updating only a small subset of model parameters, offers a promising trade-off between efficiency and effectiveness. Yet, it has lagged behind in the LLM era due to the difficulty of identifying parameters truly critical for reasoning. In this work, we state that weights with the largest magnitude after low-rank approximation are critical weights for fine-tuning, which we call Principal Weights. Surprisingly, while magnitude-based sparse fine-tuning performs poorly as a baseline on LLM fine-tuning, it becomes highly effective after rank reduction. These insights motivate our method: Low-rank Informed Sparse Fine-Tuning (LIFT). LIFT only updates the top 5% Principal Weights throughout training and consistently achieves better performance on reasoning tasks than Full FT, while maintaining memory efficiency on par with popular parameter-efficient fine-tuning methods. In addition to strong performance on target domains such as arithmetic reasoning, LIFT also retains up to 20% more source-domain knowledge, compared to Full FT and LoRA. Our code is available at: https://github.com/zihanghliu/LIFT.

Zihang Liu, Tianyu Pang, Oleg Balabanov, Chaoqun Yang, Tianjin Huang, Lu Yin, Yaoqing Yang, Shiwei Liu• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval (test)
Pass@116.46
612
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy75.7
223
Question AnsweringStrategyQA (test)
Task Accuracy75.85
74
Code-Specific Instruction Tuning EvaluationMagicoder Evaluation Suite
ARC-C Accuracy51.23
48
Instruction Fine-tuningMetaMathQA Fine-tuning Evaluation Suite (ARC-C, PIQA, MMLU, HE, GSM8K) (test)
ARC-C Accuracy50.14
32
Arithmetic ReasoningArithmetic Reasoning Benchmarks (MultiArith, GSM8K, AddSub, AQuA, SingleEQ, SVAMP, MAWPS) MATH-10K fine-tuned (test)
MultiArith Accuracy99.33
24
Math ReasoningMath Reasoning Tasks (MultiArith, GSM8K, AddSub, AQUA, SingleEq, SVAMP, MAWPS) (test)
MultiArith98.17
23
Commonsense ReasoningCommonsense170k (test)
BoolQ Accuracy75.4
22
Natural Language UnderstandingGLUE
MNLI Accuracy90.49
6
ReasoningGPQA Diamond (test)
Accuracy34.85
4
Showing 10 of 10 rows

Other info

Follow for update