Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

About

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to diverse human preferences. In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. Linear alignment incorporates a new parameterization for policy optimization under divergence constraints, which enables the extraction of optimal policy in a closed-form manner and facilitates the direct estimation of the aligned response. Extensive experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment across diverse scenarios. Our code and dataset is published on \url{https://github.com/Wizardcoast/Linear_Alignment.git}.

Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin• 2024

Related benchmarks

Task	Dataset	Result
Personalization	Personal	Creative Score (ArmoRM)0.985	33
Multi-objective personalized alignment	Multifaceted dataset (test)	AMR72	28
Multi-objective personalized alignment	Multifaceted	AMR46	28
Personalization	HelpSteer	Creative ArmoRM Score0.44	18
Personalization	Ultra Chat	Creative ArmoRM Score42	18
Personalization	Truthful QA	Creative Score (ArmoRM)41	18
Test-Time Personalization	HelpSteer	Creative Win Rate98.1	15
Test-Time Personalization	Truthful QA	Creative Win Rate97.2	15

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord