Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

About

The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft.

Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, Qian Liu• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy34.4	1398
Code Generation	HumanEval	Pass@118.3	1043
Question Answering	ARC Challenge	Accuracy82.42	906
Mathematical Reasoning	MATH	Accuracy7.34	882
Multi-task Language Understanding	MMLU	Accuracy57.67	881
Language Understanding	MMLU	Accuracy84.13	844
Reasoning	BBH	Accuracy71.01	726
Mathematical Reasoning	SVAMP	Accuracy18.2	403
Question Answering	ARC Easy	Normalized Acc90.91	391
Reading Comprehension	RACE high	Accuracy74.59	295

Showing 10 of 33 rows

Other info

Code

Follow for update

@wizwand_team Discord