Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

About

Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking. While recent Rubrics-as-Reward (RaR) methods attempt to recover this structure through explicit criteria, generating rubrics that are simultaneously reliable, scalable, and data-efficient remains an open problem. We introduce Auto-Rubric as Reward (ARR), a framework that reframes reward modeling from implicit weight optimization to explicit, criteria-based decomposition. Before any pairwise comparison, ARR externalizes a VLM's internalized preference knowledge as prompt-specific rubrics, translating holistic intent into independently verifiable quality dimensions. This conversion of implicit preference structure into inspectable, interpretable constraints substantially suppresses evaluation biases including positional bias, enabling both zero-shot deployment and few-shot conditioning on minimal supervision. To extend these gains into generative training, we propose Rubric Policy Optimization (RPO), which distills ARR's structured multi-dimensional evaluation into a robust binary reward, replacing opaque scalar regression with rubric-conditioned preference decisions that stabilize policy gradients. On text-to-image generation and image editing benchmarks, ARR-RPO outperforms pairwise reward models and VLM judges, demonstrating that explicitly externalizing implicit preference knowledge into structured rubrics achieves more reliable, data-efficient multimodal alignment, revealing that the bottleneck is the absence of a factorized interface, not a deficit of knowledge.

Juanxi Tian, Fengyuan Liu, Jiaming Han, Yilei Jiang, Yongliang Wu, Yesheng Liu, Haodong Li, Furong Xu, Wanhua Li• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score (GenEval)0.8
153
Image EditingImgEdit
ImgEdit4.43
62
Text-to-Image GenerationTIIF
TIIF Overall Score76.85
36
Human preference predictionHPD v3
Accuracy78.3
21
Image EditingGEdit-Bench
GEdit-Bench Score7.85
19
Text-to-Image GenerationUniGenBench++
Score (Short)65.89
16
Image editing preference evaluationEditReward-Bench
Accuracy63.27
14
Human Preference AgreementMM-RewardBench2 T2I
Accuracy78.9
13
Text-to-Image Preference EvaluationHPD v3 (test)
Accuracy78.3
11
Text-to-Image Preference EvaluationMM-RewardBench T2I 2
Accuracy78.9
11
Showing 10 of 14 rows

Other info

GitHub

Follow for update