Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models

About

Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.

Feng Xu, Guangyao Zhai, Xin Kong, Tingzhong Fu, Daniel F.N. Gordon, Xueli An, Benjamin Busam• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate9.80e+3
26
Put Carrot on PlateSimplerEnv WidowX
Success Rate0.987
18
Put Eggplant in BasketSimplerEnv WidowX
Success Rate97.5
18
Put Spoon on TowelSimplerEnv WidowX
Success Rate98
18
Stack Green on YellowSimplerEnv WidowX
Success Rate98
18
Robotic ManipulationManiSkill3
Stack Cube Success Rate94.3
15
Showing 6 of 6 rows

Other info

Follow for update