STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

About

Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.

Feng Xu, Guangyao Zhai, Xin Kong, Tingzhong Fu, Daniel F.N. Gordon, Xueli An, Benjamin Busam• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	SimplerEnv WidowX Robot tasks	Average Success Rate9.80e+3	32
Robotic Manipulation	ManiSkill3	Average Success Rate96.4	28
Put Carrot on Plate	SimplerEnv WidowX	Success Rate0.987	18
Put Eggplant in Basket	SimplerEnv WidowX	Success Rate97.5	18
Put Spoon on Towel	SimplerEnv WidowX	Success Rate98	18
Stack Green on Yellow	SimplerEnv WidowX	Success Rate98	18

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord