ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

About

Vision-Language-Action (VLA) models have shown substantial potential in real-world robotic manipulation. However, fine-tuning these models through supervised learning struggles to achieve robust performance due to limited, inconsistent demonstrations, especially in contact-rich environments. In this paper, we propose a reinforced fine-tuning approach for VLA models, named ConRFT, which consists of offline and online fine-tuning with a unified consistency-based training objective, to address these challenges. In the offline stage, our method integrates behavior cloning and Q-learning to effectively extract policy from a small set of demonstrations and stabilize value estimating. In the online stage, the VLA model is further fine-tuned via consistency policy, with human interventions to ensure safe exploration and high sample efficiency. We evaluate our approach on eight diverse real-world manipulation tasks. It achieves an average success rate of 96.3% within 45-90 minutes of online fine-tuning, outperforming prior supervised methods with a 144% improvement in success rate and 1.9x shorter episode length. This work highlights the potential of integrating reinforcement learning to enhance the performance of VLA models for real-world robotic applications. Videos and code are available at our project website https://cccedric.github.io/conrft/.

Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li, Dongbin Zhao• 2025

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	ManiSkill	StackCube Success Rate82.1	9
Navigation	HM3D/MP3D Scene S1 (unseen)	Success Rate (SR)88	5
Navigation	HM3D MP3D Scene S3 (unseen)	Success Rate (SR)87	5
Navigation	HM3D MP3D Scene S4 (unseen)	Success Rate (SR)94	5
Navigation	HM3D/MP3D Scene S5 (unseen)	Success Rate (SR)90	5
Navigation	HM3D/MP3D Scene S2 (unseen)	SR83	5
Robotic Manipulation	Close Trashbin Original	Success Rate100	4
Robotic Manipulation	Pick Spoon Original	Success Rate87	4
Robotic Manipulation	Close Trashbin (Disturbance)	Success Rate47	4
Robotic Manipulation	Pick Spoon Disturbance	Success Rate60	4

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord