Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

About

Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.

Zhuoran Jin, Hongbang Yuan, Kejian Zhu, Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Multimodal Reward ModelingVL-RewardBench
Accuracy72
102
Multimodal Reward ModelingMultimodal RewardBench
Accuracy85.1
50
Multimodal Reward ModelingRewardBench MM-RLHF
MCQ Score40.48
20
Showing 3 of 3 rows

Other info

Follow for update