Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

About

Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying RL to large-scale flow-based VLAs (\eg, $\pi_0$, $\pi_{0.5}$) remains challenging due to intractable action log-likelihoods raised from flow matching. We address this challenge with $\pi_{\texttt{RL}}$, featuring two technical approaches: (1) \textbf{Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) \textbf{Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration. We evaluate $\pi_{\texttt{RL}}$ across various benchmarks, with experiments demonstrating that RL yields significant performance improvements in both in-distribution and out-of-distribution settings.

Kang Chen, Zhihao Liu, Tonghe Zhang, Zhen Guo, Si Xu, Hao Lin, Hongzhi Zang, Xiang Li, Quanlu Zhang, Zhaofei Yu, Guoliang Fan, Tiejun Huang, Yu Wang, Chao Yu• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate7.96e+3
32
Robotic ManipulationManiSkill3
Average Success Rate80.1
21
Put Carrot on PlateSimplerEnv WidowX
Success Rate0.973
18
Put Spoon on TowelSimplerEnv WidowX
Success Rate82.7
18
Stack Green on YellowSimplerEnv WidowX
Success Rate83.3
18
Put Eggplant in BasketSimplerEnv WidowX
Success Rate55
18
Robot ManipulationLIBERO few-shot
Spatial Success Rate99.6
14
Robot ManipulationManiSkill (Out-Of-Distribution)
Vision Score68
12
Robot ManipulationManiSkill In-Distribution
Success Rate90.9
9
Showing 9 of 9 rows

Other info

Follow for update