OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

About

Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly handle novel objects. Our contributions are threefold. Firstly, we present a large-scale benchmark dataset specifically tailored for evaluating the performance of open-vocabulary grasping tasks. Secondly, we propose a unified visual-linguistic framework that serves as a guide for robots in successfully grasping both base and novel objects. Thirdly, we introduce two alignment modules designed to enhance visual-linguistic perception in the robotic grasping process. Extensive experiments validate the efficacy and utility of our approach. Notably, our framework achieves an average accuracy of 71.2\% and 64.4\% on base and novel categories in our new dataset, respectively.

Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng Guangliang, Yang Chenguang• 2024

Related benchmarks

Task	Dataset	Result
grasp a ball (CI)	Heavy Clutter (test)	Average Success Rate0.00e+0	8
Robotic Grasping	Simulated Cluttered Environment Overall Averages	Average Success43.8	7
Robotic Grasping	Simulated Cluttered Environment Heavy Clutter (Overall Averages)	Average Success Rate0.00e+0	7
get something to hold other things	Heavy Clutter (test)	Average Success0.00e+0	4
I need a fruit (CI)	Heavy Clutter (test)	Average Success0.00e+0	4
grasp a ball	Heavy Clutter (test)	Average Success0.00e+0	4
I need a fruit	Heavy Clutter (test)	Average Success0.00e+0	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord