Generative RLHF-V: Learning Principles from Multi-modal Human Preference

About

Training multi-modal large language models (MLLMs) that align with human intentions is a long-term challenge. Traditional score-only reward models for alignment suffer from low accuracy, weak generalization, and poor interpretability, blocking the progress of alignment methods, e.g., reinforcement learning from human feedback (RLHF). Generative reward models (GRMs) leverage MLLMs' intrinsic reasoning capabilities to discriminate pair-wise responses, but their pair-wise paradigm makes it hard to generalize to learnable rewards. We introduce Generative RLHF-V, a novel alignment framework that integrates GRMs with multi-modal RLHF. We propose a two-stage pipeline: $\textbf{multi-modal generative reward modeling from RL}$, where RL guides GRMs to actively capture human intention, then predict the correct pair-wise scores; and $\textbf{RL optimization from grouped comparison}$, which enhances multi-modal RL scoring precision by grouped responses comparison. Experimental results demonstrate that, besides out-of-distribution generalization of RM discrimination, our framework improves 4 MLLMs' performance across 7 benchmarks by $18.1\%$, while the baseline RLHF is only $5.3\%$. We further validate that Generative RLHF-V achieves a near-linear improvement with an increasing number of candidate responses. Our code and models can be found at https://generative-rlhf-v.github.io.

Jiayi Zhou, Jiaming Ji, Boyuan Chen, Jiapeng Sun, Wenqi Chen, Donghai Hong, Sirui Han, Yike Guo, Yaodong Yang• 2025

Related benchmarks

Task	Dataset	Result
Safety & Helpfulness Evaluation	SPA-VL	Safety Score59.81	32
Safety & Helpfulness Evaluation	VLGuard	Safety Score59.34	32
Multi-turn MLLM Safety Evaluation	STEER-Beaver Multi-turn	Safety30.23	18
Multi-turn MLLM Safety Evaluation	STEER-VLS Multi-turn	Safety Score21.45	18
Multi-turn MLLM Safety Evaluation	STEER-DyS Multi-turn	Safety Score18.87	18
Multi-turn MLLM Safety Evaluation	STEER-SPA Multi-turn	Safety17.11	18
Multi-turn MLLM Safety Evaluation	STEER Average Multi-turn	Safety Score19.53	18
Safety & Helpfulness Evaluation	Beavertails	Safety Score49.49	18
Safety & Helpfulness Evaluation	MM-Safety	Safety Score46.68	18
Safety & Helpfulness Evaluation	VLSBench	Safety Score57.3	18

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord