Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching

About

Reinforcement Learning (RL) has recently emerged as a powerful technique for improving image and video generation in Diffusion and Flow Matching models, specifically for enhancing output quality and alignment with prompts. A critical step for applying online RL methods on Flow Matching is the introduction of stochasticity into the deterministic framework, commonly realized by Stochastic Differential Equation (SDE). Our investigation reveals a significant drawback to this approach: SDE-based sampling introduces pronounced noise artifacts in the generated images, which we found to be detrimental to the reward learning process. A rigorous theoretical analysis traces the origin of this noise to an excess of stochasticity injected during inference. To address this, we draw inspiration from Denoising Diffusion Implicit Models (DDIM) to reformulate the sampling process. Our proposed method, Coefficients-Preserving Sampling (CPS), eliminates these noise artifacts. This leads to more accurate reward modeling, ultimately enabling faster and more stable convergence for reinforcement learning-based optimizers like Flow-GRPO and Dance-GRPO. Code will be released at https://github.com/IamCreateAI/FlowCPS

Feng Wang, Zihao Yu• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score (GenEval)0.972	153
Text-to-Image Generation	HPS v2	--	71
Text-to-Image Generation	GenEval 2	Soft TIFA GM91.2	62
Text-to-Image Generation	GenEval2 In-Domain	GenEval2 Score91.2	50
Text-to-Image Generation	PickScore (Out-of-Domain)	CLIP Score0.288	46
Text-to-Image Generation	PickScore	PickScore24.594	44
Text-to-Image Generation	UniGenBench 600 prompts (test)	HPS-v2 Score34.92	21
Text-to-Image Generation	UniGenBench	Attribute Score69.66	21
KL Divergence Evaluation	FLUX2-9B	KL Divergence240	15
KL Divergence Evaluation	SD 3.5	KL Divergence (x10^-3)2.41	10

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord