Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

About

Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we introduce Marking Open-world Keypoint Affordances (MOKA), an approach that employs VLMs to solve robotic manipulation tasks specified by free-form language instructions. Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world. By prompting the pre-trained VLM, our approach utilizes the VLM's commonsense knowledge and concept understanding acquired from broad data sources to predict affordances and generate motions. To facilitate the VLM's reasoning in zero-shot and few-shot manners, we propose a visual prompting technique that annotates marks on images, converting affordance reasoning into a series of visual question-answering problems that are solvable by the VLM. We further explore methods to enhance performance with robot experiences collected by MOKA through in-context learning and policy distillation. We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.

Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine• 2024

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationSimplerEnv WidowX Robot
Success Rate: Put Spoon on Towel45.8
12
Move the moka pot to the right of drawerxArm 6 Real-world Tabletop
Grasp Success Rate16.7
5
Move the nearest object to the right side of the drawerxArm 6 Real-world Tabletop
Object Correctness83.3
5
Pick the [x] toothbrush and place it to the bucketxArm 6 Real-world Tabletop
Correct Object Pick16.7
5
Place the fork in the green binxArm 6 Real-world Tabletop
Grasp Success Rate16.7
5
Put the screwdriver between drawer and the vasexArm 6 Real-world Tabletop
Grasp Success Rate83.3
5
Robotic ManipulationTask VII
Success Rate0.00e+0
5
Robotic ManipulationTask VIII
Success Rate50
5
Move the egg to the bowlxArm 6 Real-world Tabletop
Success Rate40
5
Move the vise to the red basketxArm 6 Real-world Tabletop
Grasp Success Rate0.00e+0
5
Showing 10 of 20 rows

Other info

Follow for update