Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Model Extrapolation Expedites Alignment

About

Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs' alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, which highlights ExPO's broader utility in efficiently enhancing LLM alignment.

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng• 2024

Related benchmarks

TaskDatasetResultRank
Multi-turn Dialogue EvaluationMT-Bench
Overall Score8.45
331
Instruction FollowingAlpacaEval 2.0
LC Win Rate37.8
281
Code GenerationHumanEval+--
189
Code GenerationMBPP+
Accuracy72
75
Math ReasoningAIME 2024
Accuracy0.587
37
Math ReasoningAIME 2025
Accuracy55.2
33
Mathematical ReasoningMath Reasoning Suite Average
Average Accuracy45.8
27
Mathematical ReasoningHMMT Feb 2025
Accuracy32.4
23
Mathematical ReasoningHMMT Nov 2025
Accuracy37
18
Code GenerationLiveCodeBench (LCB)
Accuracy29
9
Showing 10 of 11 rows

Other info

Code

Follow for update