Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment

About

Aligning language models with human preferences presents significant challenges, particularly in achieving personalization without incurring excessive computational costs. Existing methods rely on reward signals and additional annotated data, limiting their scalability and adaptability to diverse human values. To address these challenges, we introduce Persona-judge, a novel discriminative paradigm that enables training-free personalized alignment with unseen preferences. Instead of optimizing policy parameters through external reward feedback, Persona-judge leverages the intrinsic preference judgment capabilities of the model. Specifically, a draft model generates candidate tokens conditioned on a given preference, while a judge model, embodying another preference, cross-validates the predicted tokens whether to be accepted. Experimental results demonstrate that Persona-judge, using the inherent preference evaluation mechanisms of the model, offers a scalable and computationally efficient solution to personalized alignment, paving the way for more adaptive customized alignment. Our code is available here.

Xiaotian Zhang, Ruizhe Chen, Yang Feng, Zuozhu Liu• 2025

Related benchmarks

Task	Dataset	Result	Rank
Preference Alignment	Psoups (test)	Helpfulness (RM)1.29		13

Showing 1 of 1 rows

Other info

Code

Follow for update

@wizwand_team Discord