Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Improving Video Generation with Human Feedback

About

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs.

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Video GenerationVBench
Quality Score85.02
111
Video GenerationVBench 2.0 (test)
Total Score49.27
44
Video GenerationVBench aesthetic and imaging quality dimensions
Aesthetic Quality0.6353
15
Video GenerationVideoJAM (test)
Aesthetic Quality0.5611
15
Human Preference AlignmentREACT-Video
Acc (Tie, Overall)41.5
12
Video Preference AlignmentGenAI-Bench
Alignment Accuracy (w/ties)49.41
11
Physical Plausibility and Subject DeformityCurated Prompt Set (OOD)
RM ACC71.6
8
TAInternal ID (train)
RM ACC0.495
8
TACurated Prompt Set (OOD)
RM Accuracy0.4914
8
Physical Plausibility and Subject DeformityInternal Dataset (ID train)
RM ACC54.4
8
Showing 10 of 18 rows

Other info

Follow for update