Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Target-Oriented Object Grasping via Multimodal Human Guidance

About

In the context of human-robot interaction and collaboration scenarios, robotic grasping still encounters numerous challenges. Traditional grasp detection methods generally analyze the entire scene to predict grasps, leading to redundancy and inefficiency. In this work, we reconsider 6-DoF grasp detection from a target-referenced perspective and propose a Target-Oriented Grasp Network (TOGNet). TOGNet specifically targets local, object-agnostic region patches to predict grasps more efficiently. It integrates seamlessly with multimodal human guidance, including language instructions, pointing gestures, and interactive clicks. Thus our system comprises two primary functional modules: a guidance module that identifies the target object in 3D space and TOGNet, which detects region-focal 6-DoF grasps around the target, facilitating subsequent motion planning. Through 50 target-grasping simulation experiments in cluttered scenes, our system achieves a success rate improvement of about 13.7%. In real-world experiments, we demonstrate that our method excels in various target-oriented grasping scenarios.

Pengwei Xie, Siang Chen, Dingchang Hu, Yixiang Dai, Kaiqin Yang, Guijin Wang• 2024

Related benchmarks

TaskDatasetResultRank
Grasp DetectionGraspNet-1Billion (RealSense)
AP (Average)40.73
32
Grasp DetectionGraspNet-1Billion RealSense Novel
AP23.74
25
Grasp DetectionGraspNet-1Billion RealSense (Seen)
AP51.84
25
Grasp DetectionGraspNet-1Billion RealSense Similar
AP0.4662
25
Grasp DetectionGraspNet-1Billion Kinect camera (seen)
AP49.6
23
Grasp DetectionGraspNet-1Billion Kinect camera (Similar split)
AP40.03
13
Grasp DetectionGraspNet-1Billion Kinect camera (Novel)
AP19.58
13
Grasp DetectionGraspNet-1Billion Kinect--
9
Showing 8 of 8 rows

Other info

Follow for update