Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unified Reward Model for Multimodal Understanding and Generation

About

Recent advances in human preference alignment have significantly improved multimodal generation and understanding. A key approach is to train reward models that provide supervision signals for preference optimization. However, existing reward models are often task-specific, limiting their adaptability across diverse visual applications. We also argue that a reward model that jointly learning to assess multiple vision tasks may foster a synergistic effect, where improved image understanding enhances image generation assessment, and refined image evaluation benefits video assessment through better frame analysis. To this end, this paper proposes UnifiedReward, the first unified reward model for multimodal understanding and generation assessment. It supports both pairwise ranking and pointwise scoring, providing effective reward signals for vision model preference alignment. Specifically, (1) we first train UnifiedReward on our constructed large-scale human preference dataset, which covers both image and video generation/understanding tasks. (2) Then, we leverage it to automatically construct high-quality pairwise preference data from vision models by progressively filtering their outputs through our two-stage strategy, i.e., pair ranking and point sifting. (3) Finally, we use these data to align vision models with human preferences via Direct Preference Optimization (DPO). Experimental results show that jointly learning to assess diverse visual tasks yields substantial mutual benefits. We further apply our pipeline to both vision understanding and generation, achieving consistent improvements across each domain.

Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, Jiaqi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Video GenerationVBench
Quality Score85.46
111
Reward ModelingVLRewardBench (test)
General60.6
24
Human Preference EvaluationHPD v2 (test)
Preference Accuracy83.1
18
Human Preference EvaluationImageReward (test)
Preference Accuracy0.6382
18
Human Preference AlignmentREACT-Video
Acc (Tie, Overall)41.6
12
Pairwise PreferenceGenAI Bench (test)
Accuracy72.38
11
Video Preference AlignmentGenAI-Bench
Alignment Accuracy (w/ties)54.8
11
Pairwise PreferenceHPD v3 (test)
Accuracy71.96
11
Image Generation AssessmentGenAI-Bench Image (test)
Accuracy71.5
8
Image Generation AssessmentMMRB2 (test)
Accuracy60
8
Showing 10 of 23 rows

Other info

Follow for update