Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IG-RFT: An Interaction-Guided RL Framework for VLA Models in Long-Horizon Robotic Manipulation

About

Vision-Language-Action (VLA) models have demonstrated significant potential for generalist robotic policies; however, they struggle to generalize to long-horizon complex tasks in novel real-world domains due to distribution shifts and the scarcity of high-quality demonstrations. Although reinforcement learning (RL) offers a promising avenue for policy improvement, applying it to real-world VLA fine-tuning faces challenges regarding exploration efficiency, training stability, and sample cost. To address these issues, we propose IG-RFT, a novel Interaction-Guided Reinforced Fine-Tuning system designed for flow-based VLA models. Firstly, to facilitate effective policy optimization, we introduce Interaction-Guided Advantage Weighted Regression (IG-AWR), an RL algorithm that dynamically modulates exploration intensity based on the robot's interaction status. Furthermore, to address the limitations of sparse or task-specific rewards, we design a novel hybrid dense reward function that integrates the trajectory-level reward and the subtask-level reward. Finally, we construct a three-stage RL system comprising SFT, Offline RL, and Human-in-the-Loop RL for fine-tuning VLA models. Extensive real-world experiments on four challenging long-horizon tasks demonstrate that IG-RFT achieves an average success rate of 85.0%, significantly outperforming SFT (18.8%) and standard Offline RL baselines (40.0%). Ablation studies confirm the critical contributions of IG-AWR and hybrid reward shaping. In summary, our work establishes and validates a novel reinforced fine-tuning system for VLA models in real-world robotic manipulation.

Zhian Su, Weijie Kong, Haonan Dong, Huixu Dong• 2026

Related benchmarks

TaskDatasetResultRank
Block-stackingReal-world Long-horizon Tasks
Success Rate90
4
Drink ShelvingReal-world Long-horizon Tasks
Success Rate70
4
Fruit BaggingReal-world Long-horizon Tasks
Success Rate95
4
Long-horizon Manipulation (Average)Real-world Long-horizon Tasks
Success Rate0.85
4
Parcel PackingReal-world Long-horizon Tasks
Success Rate85
4
Showing 5 of 5 rows

Other info

Follow for update