Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

About

In this study, we are interested in imbuing robots with the capability of physically-grounded task planning. Recent advancements have shown that large language models (LLMs) possess extensive knowledge useful in robotic tasks, especially in reasoning and planning. However, LLMs are constrained by their lack of world grounding and dependence on external affordance models to perceive environmental information, which cannot jointly reason with LLMs. We argue that a task planner should be an inherently grounded, unified multimodal system. To this end, we introduce Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning that leverages vision-language models (VLMs) to generate a sequence of actionable steps. ViLa directly integrates perceptual data into its reasoning and planning process, enabling a profound understanding of commonsense knowledge in the visual world, including spatial layouts and object attributes. It also supports flexible multimodal goal specification and naturally incorporates visual feedback. Our extensive evaluation, conducted in both real-robot and simulated environments, demonstrates ViLa's superiority over existing LLM-based planners, highlighting its effectiveness in a wide array of open-world manipulation tasks.

Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, Yang Gao• 2023

Related benchmarks

TaskDatasetResultRank
Robotic PlanningRobotouille Easy
Solved Rate46
7
Robotic PlanningRobotouille Hard
Solved Rate13.9
7
Robotic PlanningRobotouille Impossible
Solved Percentage20
7
Robot Skill LearningFranka Emika Research Robot Environment Panda In-domain 3
Success Rate46.7
6
Task PlanningTable Clean Real (train)
Success Rate20.7
6
Task PlanningTable Clean Real (test)
Success Rate11.3
6
Robot Skill LearningFranka Emika Research 3 (Panda) Robot Environment Generalization
Success Rate0.067
6
Robot Skill LearningFranka Emika Research Panda Impossible 3
Solved Rate6.7
6
Showing 8 of 8 rows

Other info

Follow for update