Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

About

Recent vision-language-action models (VLAs) build upon pretrained vision-language models and leverage diverse robot datasets to demonstrate strong task execution, language following ability, and semantic generalization. Despite these successes, VLAs struggle with novel robot setups and require fine-tuning to achieve good performance, yet how to most effectively fine-tune them is unclear given many possible strategies. In this work, we study key VLA adaptation design choices such as different action decoding schemes, action representations, and learning objectives for fine-tuning, using OpenVLA as our representative base model. Our empirical analysis informs an Optimized Fine-Tuning (OFT) recipe that integrates parallel decoding, action chunking, a continuous action representation, and a simple L1 regression-based learning objective to altogether improve inference efficiency, policy performance, and flexibility in the model's input-output specifications. We propose OpenVLA-OFT, an instantiation of this recipe, which sets a new state of the art on the LIBERO simulation benchmark, significantly boosting OpenVLA's average success rate across four task suites from 76.5% to 97.1% while increasing action generation throughput by 26$\times$. In real-world evaluations, our fine-tuning recipe enables OpenVLA to successfully execute dexterous, high-frequency control tasks on a bimanual ALOHA robot and outperform other VLAs ($\pi_0$ and RDT-1B) fine-tuned using their default recipes, as well as strong imitation learning policies trained from scratch (Diffusion Policy and ACT) by up to 15% (absolute) in average success rate. We release code for OFT and pretrained model checkpoints at https://openvla-oft.github.io/.

Moo Jin Kim, Chelsea Finn, Percy Liang• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Object Achievement98.6
957
Robotic ManipulationLIBERO
Spatial Success Rate98
527
Robotic ManipulationLIBERO-Plus
Language Understanding Score99
249
Robot ManipulationLIBERO (test)
Average Success Rate97.1
220
Robotic ManipulationCalvin ABCD→D
Avg Length3.917
130
Robot ManipulationLIBERO Object
Success Rate98.3
127
Robot ManipulationLIBERO
Spatial Success Rate97.6
116
Robotic ManipulationRoboTwin 2.0
Average Success Rate72.3
100
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel34.2
98
Robotic ManipulationLIBERO Long--
91
Showing 10 of 223 rows
...

Other info

Follow for update