Model Extrapolation Expedites Alignment

About

Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs' alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, which highlights ExPO's broader utility in efficiently enhancing LLM alignment.

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng• 2024

Related benchmarks

Task	Dataset	Result
Instruction Following	AlpacaEval 2.0	Win Rate46.2	722
Multi-turn Dialogue Evaluation	MT-Bench	Overall Score8.45	532
Code Generation	HumanEval+	--	393
Code Generation	MBPP+	Accuracy72	236
Math Reasoning	AIME 2025	Accuracy55.2	49
Mathematical Reasoning	Math Reasoning Suite Average	Average Accuracy45.8	49
Mathematical Reasoning	HMMT Feb 2025	Accuracy32.4	45
Mathematical Reasoning	AIME 26	Accuracy20	41
Mathematical Reasoning	AMC23	Accuracy67.8	38
Math Reasoning	AIME 2024	Accuracy0.587	37

Showing 10 of 15 rows

Other info

Code

Follow for update

@wizwand_team Discord