Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

About

Vision-Language-Action (VLA) models have shown substantial potential in real-world robotic manipulation. However, fine-tuning these models through supervised learning struggles to achieve robust performance due to limited, inconsistent demonstrations, especially in contact-rich environments. In this paper, we propose a reinforced fine-tuning approach for VLA models, named ConRFT, which consists of offline and online fine-tuning with a unified consistency-based training objective, to address these challenges. In the offline stage, our method integrates behavior cloning and Q-learning to effectively extract policy from a small set of demonstrations and stabilize value estimating. In the online stage, the VLA model is further fine-tuned via consistency policy, with human interventions to ensure safe exploration and high sample efficiency. We evaluate our approach on eight diverse real-world manipulation tasks. It achieves an average success rate of 96.3% within 45-90 minutes of online fine-tuning, outperforming prior supervised methods with a 144% improvement in success rate and 1.9x shorter episode length. This work highlights the potential of integrating reinforcement learning to enhance the performance of VLA models for real-world robotic applications. Videos and code are available at our project website https://cccedric.github.io/conrft/.

Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li, Dongbin Zhao• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationClose Trashbin Original
Success Rate100
4
Robotic ManipulationPick Spoon Original
Success Rate87
4
Robotic ManipulationClose Trashbin (Disturbance)
Success Rate47
4
Robotic ManipulationPick Spoon Disturbance
Success Rate60
4
Robotic ManipulationPush-T Original
Success Rate47
4
Robotic ManipulationPush-T Disturbance
Success Rate27
4
Robotic ManipulationHang Chinese Knot Original
Success Rate0.00e+0
4
Robotic ManipulationHang Chinese Knot Disturbance
Success Rate0.00e+0
4
Double-Fold the TowelFranka Research 3
Success Rate75
2
Insert Two SocketsFranka Research 3
Success Rate80
2
Showing 10 of 12 rows

Other info

Follow for update