DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation
About
With the continued advancement of text-to-image (T2I) generation, producing high-quality images is becoming increasingly attainable; consequently, user demands are shifting toward images that better satisfy their specific requirements. As reward models play an increasingly important role in assessing whether generated images align with user preference, this trend introduces an important challenge for reward modeling: rather than relying solely on static and general evaluation dimensions, reward models should account for the task-relevant and fine-grained criteria through which users assess whether generated images meet their specific requirements. To address this challenge, we propose DyCoRM, a dynamic, criterion-aware reward model that grounds task-relevant criteria and performs criterion-aware preference comparison. To support this setting, we construct DyCoDataset-20K, which provides dynamic criteria together with criterion-level annotations, and further derive DyCoBench-1K, a benchmark for systematically evaluating reward models under dynamic criteria. We further introduce DyCoPick, which applies criterion-aware reward modeling to selecting T2I images. Our contributions establish the first reward modeling framework for dynamic and fine-grained evaluation and practical application in T2I generation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human preference prediction | HPD v2 | Accuracy85.1 | 25 | |
| Pairwise Preference Prediction | DyCoBench-1K Single Criterion | P(A > B)70.2 | 17 | |
| Pairwise Preference Prediction | DyCoBench-1K Multiple Criteria | Preference Rate (A > B)65.6 | 17 | |
| Pairwise Preference Prediction | DyCoBench-1K Overall Preference | Preference Rate (A > B)78.3 | 17 | |
| Text-to-Image Preference Prediction | Pick-a-Pic | Accuracy73.4 | 17 | |
| Text-to-Image Preference Prediction | HPD v3 | Accuracy77.2 | 17 | |
| Text-to-Image Preference Prediction | Cross-domain Aggregate | Average Accuracy77 | 17 | |
| Text-to-Image Preference Prediction | ImageReward | Accuracy67.2 | 17 |