Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model Extrapolation Expedites Alignment

About

Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs' alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, which highlights ExPO's broader utility in efficiently enhancing LLM alignment.

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng• 2024

Related benchmarks

TaskDatasetResultRank
Instruction FollowingAlpacaEval 2.0
Win Rate46.2
722
Multi-turn Dialogue EvaluationMT-Bench
Overall Score8.45
532
Code GenerationHumanEval+--
393
Code GenerationMBPP+
Accuracy72
236
Math ReasoningAIME 2025
Accuracy55.2
49
Mathematical ReasoningMath Reasoning Suite Average
Average Accuracy45.8
49
Mathematical ReasoningHMMT Feb 2025
Accuracy32.4
45
Mathematical ReasoningAIME 26
Accuracy20
41
Mathematical ReasoningAMC23
Accuracy67.8
38
Math ReasoningAIME 2024
Accuracy0.587
37
Showing 10 of 15 rows

Other info

Code

Follow for update