ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

About

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/

Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	Franka-Kitchen	--	15
Robot Policy Inference Efficiency	NVIDIA RTX 4090 simulation (inference)	Inference Time (ms)2.5	12
Robot Policy Inference Efficiency	NVIDIA RTX 2080 physical robot deployment (inference)	Inference Time (ms)12	12
Dexterous Manipulation	Dexterous Manipulation Simulation (test)	Grasping58.4	12
Robot Manipulation	RoboMimic	Can Success Rate97.8	10
Locomotion	D4RL Locomotion	Hopper-v2 Score3.25e+3	6
Reinforcement Learning	D4RL	Hopper Score3.25e+3	6
Lift	RoboMimic MH 100 trajectories Simplified (multi-human)	Success Rate100	5
Can	RoboMimic MH 100 trajectories Simplified (multi-human)	Success Rate99	5
Square	RoboMimic MH 100 trajectories Simplified (multi-human)	Success Rate80	4

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord