Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models

About

The key to effective alignment lies in high-quality preference data. Recent research has focused on automated alignment, which involves developing alignment systems with minimal human intervention. However, prior research has predominantly focused on developing data generation methods, while insufficient attention has been paid to quality control mechanisms, which often produce inaccurate and unhelpful data, leading to unpredictable benefits during iterative optimization. In this paper, we present Self-Steering Optimization ($SSO$), an algorithm that autonomously generates high-quality preference data, eliminating manual annotation requirements. $SSO$ employs a specialized optimization objective to build a data generator from the policy model itself, which is used to produce accurate and on-policy data. We demonstrate $SSO$'s effectiveness through comprehensive experiments on two series of models: Llama 3 and Qwen 2. Our evaluation across diverse benchmarks shows that $SSO$ consistently outperforms baselines in human preference alignment and reward optimization. Further analysis validates $SSO$ as a scalable framework for preference optimization, benefiting the advancement in automated alignment techniques.

Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Ben He, Le Sun, Jingren Zhou, Junyang Lin• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy84.7
983
Multi-task Language UnderstandingMMLU
Accuracy71.2
842
Multi-turn Dialogue EvaluationMT-Bench
Overall Score8.66
331
Reward ModelingRewardBench
Avg Score86.3
118
Instruction Following and Helpfulness EvaluationAlpacaEval 2.0
Win Rate49.4
58
Advanced Mathematical Problem SolvingMATH
Accuracy52.3
41
Constraint-following Instruction EvaluationIFEval
Average Score53.4
16
Showing 7 of 7 rows

Other info

Follow for update