Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Flow-GRPO: Training Flow Matching Models via Online RL

About

We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps, significantly improving sampling efficiency without sacrificing performance. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For compositional generation, RL-tuned SD3.5-M generates nearly perfect object counts, spatial relations, and fine-grained attributes, increasing GenEval accuracy from $63\%$ to $95\%$. In visual text rendering, accuracy improves from $59\%$ to $92\%$, greatly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, very little reward hacking occurred, meaning rewards did not increase at the cost of appreciable image quality or diversity degradation.

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, Wanli Ouyang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score95
467
Text-to-Image GenerationGenEval
GenEval Score95
277
Text-to-Image GenerationGenEval (test)
Two Obj. Acc94.2
169
Text-to-Image GenerationT2I-CompBench++
Non-Spatial0.3195
31
Compositional Image GenerationGenEval
Overall Score0.95
22
Composition Image GenerationGenEval
GenEval Score95
20
Text to ImageGenEval 11 (test)
Accuracy (Single Obj)100
19
Text to ImagePartiPrompts 42 (test)
VQAScore88
19
Text to ImagePickScore 15 (test)
PickScore23.566
17
Text to ImageOCR 6 (test)
OCR Score97.1
17
Showing 10 of 49 rows

Other info

Code

Follow for update