Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

About

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/

Tonghe Zhang, Chao Yu, Sichang Su, Yu Wang• 2025

Related benchmarks

TaskDatasetResultRank
Robot Policy Inference EfficiencyNVIDIA RTX 4090 simulation (inference)
Inference Time (ms)2.5
12
Robot Policy Inference EfficiencyNVIDIA RTX 2080 physical robot deployment (inference)
Inference Time (ms)12
12
Dexterous ManipulationDexterous Manipulation Simulation (test)
Grasping58.4
12
LiftRoboMimic MH 100 trajectories Simplified (multi-human)
Success Rate100
5
CanRoboMimic MH 100 trajectories Simplified (multi-human)
Success Rate99
5
SquareRoboMimic MH 100 trajectories Simplified (multi-human)
Success Rate80
4
TransportRoboMimic MH 100 trajectories Simplified
Success Rate85
4
Showing 7 of 7 rows

Other info

Follow for update