DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation

About

With the continued advancement of text-to-image (T2I) generation, producing high-quality images is becoming increasingly attainable; consequently, user demands are shifting toward images that better satisfy their specific requirements. As reward models play an increasingly important role in assessing whether generated images align with user preference, this trend introduces an important challenge for reward modeling: rather than relying solely on static and general evaluation dimensions, reward models should account for the task-relevant and fine-grained criteria through which users assess whether generated images meet their specific requirements. To address this challenge, we propose DyCoRM, a dynamic, criterion-aware reward model that grounds task-relevant criteria and performs criterion-aware preference comparison. To support this setting, we construct DyCoDataset-20K, which provides dynamic criteria together with criterion-level annotations, and further derive DyCoBench-1K, a benchmark for systematically evaluating reward models under dynamic criteria. We further introduce DyCoPick, which applies criterion-aware reward modeling to selecting T2I images. Our contributions establish the first reward modeling framework for dynamic and fine-grained evaluation and practical application in T2I generation.

Jiaying Qian, Ziheng Jia, Qian Zhang, Zicheng Zhang, Jiayi Guo, Junqi Zhang, Guangtao Zhai, Xiongkuo Min• 2026

Related benchmarks

Task	Dataset	Result
Human preference prediction	HPD v2	Accuracy85.1	25
Pairwise Preference Prediction	DyCoBench-1K Single Criterion	P(A > B)70.2	17
Pairwise Preference Prediction	DyCoBench-1K Multiple Criteria	Preference Rate (A > B)65.6	17
Pairwise Preference Prediction	DyCoBench-1K Overall Preference	Preference Rate (A > B)78.3	17
Text-to-Image Preference Prediction	Pick-a-Pic	Accuracy73.4	17
Text-to-Image Preference Prediction	HPD v3	Accuracy77.2	17
Text-to-Image Preference Prediction	Cross-domain Aggregate	Average Accuracy77	17
Text-to-Image Preference Prediction	ImageReward	Accuracy67.2	17

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord