Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GRAPE: Generalizing Robot Policy via Preference Alignment

About

Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 58.20%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 37.44% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/

Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Siwei Han, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao• 2024

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement82.2
700
Robot ManipulationLIBERO Object
Success Rate91.2
70
Robotic ManipulationLIBERO (test)
Object Success Rate92.1
45
Robotic ManipulationLIBERO Long
Success Rate55.8
44
Robot ManipulationSimplerEnv WidowX Robot tasks
Average Success Rate5.36e+3
32
Robotic ManipulationLIBERO Goal
Success Rate82.2
21
Robotic ManipulationManiSkill3
Average Success Rate22.6
21
Put Eggplant in BasketSimplerEnv WidowX
Success Rate58.7
18
Stack Green on YellowSimplerEnv WidowX
Success Rate48.3
18
Put Carrot on PlateSimplerEnv WidowX
Success Rate0.593
18
Showing 10 of 12 rows

Other info

Follow for update