ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
About
Vision-Language-Action (VLA) models have shown substantial potential in real-world robotic manipulation. However, fine-tuning these models through supervised learning struggles to achieve robust performance due to limited, inconsistent demonstrations, especially in contact-rich environments. In this paper, we propose a reinforced fine-tuning approach for VLA models, named ConRFT, which consists of offline and online fine-tuning with a unified consistency-based training objective, to address these challenges. In the offline stage, our method integrates behavior cloning and Q-learning to effectively extract policy from a small set of demonstrations and stabilize value estimating. In the online stage, the VLA model is further fine-tuned via consistency policy, with human interventions to ensure safe exploration and high sample efficiency. We evaluate our approach on eight diverse real-world manipulation tasks. It achieves an average success rate of 96.3% within 45-90 minutes of online fine-tuning, outperforming prior supervised methods with a 144% improvement in success rate and 1.9x shorter episode length. This work highlights the potential of integrating reinforcement learning to enhance the performance of VLA models for real-world robotic applications. Videos and code are available at our project website https://cccedric.github.io/conrft/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | Close Trashbin Original | Success Rate100 | 4 | |
| Robotic Manipulation | Pick Spoon Original | Success Rate87 | 4 | |
| Robotic Manipulation | Close Trashbin (Disturbance) | Success Rate47 | 4 | |
| Robotic Manipulation | Pick Spoon Disturbance | Success Rate60 | 4 | |
| Robotic Manipulation | Push-T Original | Success Rate47 | 4 | |
| Robotic Manipulation | Push-T Disturbance | Success Rate27 | 4 | |
| Robotic Manipulation | Hang Chinese Knot Original | Success Rate0.00e+0 | 4 | |
| Robotic Manipulation | Hang Chinese Knot Disturbance | Success Rate0.00e+0 | 4 | |
| Double-Fold the Towel | Franka Research 3 | Success Rate75 | 2 | |
| Insert Two Sockets | Franka Research 3 | Success Rate80 | 2 |