Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

About

Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.

Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng Guangliang, Yang Chenguang• 2024

Related benchmarks

TaskDatasetResultRank
grasp a ball (CI)Heavy Clutter (test)
Average Success Rate0.00e+0
8
Robotic GraspingSimulated Cluttered Environment Overall Averages
Average Success43.8
7
Robotic GraspingSimulated Cluttered Environment Heavy Clutter (Overall Averages)
Average Success Rate0.00e+0
7
get something to hold other thingsHeavy Clutter (test)
Average Success0.00e+0
4
I need a fruit (CI)Heavy Clutter (test)
Average Success0.00e+0
4
grasp a ballHeavy Clutter (test)
Average Success0.00e+0
4
I need a fruitHeavy Clutter (test)
Average Success0.00e+0
4
Showing 7 of 7 rows

Other info

Follow for update