P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

About

Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approaches largely treat user context as a static or implicit conditioning signal, failing to capture the dynamic and multi-faceted nature of human judgment. In this paper, we propose P-Check, a novel personalized reward modeling framework, designed to train a plug-and-play checklist generator that synthesizes dynamic evaluation criteria for guiding the reward prediction. To better align these checklists with personalized nuances, we introduce Preference-Contrastive Criterion Weighting, a training strategy that assigns saliency scores to criteria based on their discriminative power for personalized judgment. We conduct extensive experiments and demonstrate that P-Check not only improves reward accuracy but also enhances downstream personalized generation, and remains robust in OOD scenarios.

Kwangwook Seo, Dongha Lee• 2026

Related benchmarks

Task	Dataset	Result
Personalized Reward Modeling	PRISM Personalized	Accuracy65.11	44
Personalized Reward Modeling	Chatbot Arena Personalized	Accuracy61.56	42
Personalized Generation	BESPOKE	R-L9.94	18
Personalized Reward Modeling	BESPOKE-Meta OOD	Binary Preference Accuracy75.48	18

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord