Drift: Decoding-time Personalized Alignments with Implicit User Preferences

About

Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.

Minbeom Kim, Kang-il Lee, Seongho Joo, Hwaran Lee, Thibaut Thonet, Kyomin Jung• 2025

Related benchmarks

Task	Dataset	Result
Preference Prediction	PRISM (test)	Accuracy61.31	51
Personalized Preference Modeling	Summarize from Human Feedback (test)	Mean Accuracy58.68	12
Personalized Summarization	Summarize from Human Feedback (test)	Win Rate68.88	9

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord