Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

About

Although large vision-language-action (VLA) models pretrained on extensive robot datasets offer promising generalist policies for robotic learning, they still struggle with spatial-temporal dynamics in interactive robotics, making them less effective in handling complex tasks, such as manipulation. In this work, we introduce visual trace prompting, a simple yet effective approach to facilitate VLA models' spatial-temporal awareness for action prediction by encoding state-action trajectories visually. We develop a new TraceVLA model by finetuning OpenVLA on our own collected dataset of 150K robot manipulation trajectories using visual trace prompting. Evaluations of TraceVLA across 137 configurations in SimplerEnv and 4 tasks on a physical WidowX robot demonstrate state-of-the-art performance, outperforming OpenVLA by 10% on SimplerEnv and 3.5x on real-robot tasks and exhibiting robust generalization across diverse embodiments and scenarios. To further validate the effectiveness and generality of our method, we present a compact VLA model based on 4B Phi-3-Vision, pretrained on the Open-X-Embodiment and finetuned on our dataset, rivals the 7B OpenVLA baseline while significantly improving inference efficiency.

Ruijie Zheng, Yongyuan Liang, Shuaiyi Huang, Jianfeng Gao, Hal Daum\'e III, Andrey Kolobov, Furong Huang, Jianwei Yang• 2024

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement75.1
700
Robotic ManipulationLIBERO
Spatial Success Rate84.9
314
Robot ManipulationLIBERO (test)
Average Success Rate74.8
184
Robot ManipulationSimplerEnv WidowX Robot tasks (test)
Success Rate (Spoon)12.5
79
Robot ManipulationSimplerEnv Google Robot tasks Variant Aggregation
Average Success Rate45
67
Robot Policy LearningLIBERO
S (Spatial) Rate84.6
65
Robot ManipulationSimplerEnv Google Robot tasks Visual Matching
Pick Coke Can Success Rate28
62
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel12.5
58
Robotic ManipulationLIBERO v1 (test)
Average Success Rate78.1
46
Robotic ManipulationLIBERO (test)
Object Success Rate85.2
45
Showing 10 of 52 rows

Other info

Follow for update