Target-Oriented Object Grasping via Multimodal Human Guidance

About

In the context of human-robot interaction and collaboration scenarios, robotic grasping still encounters numerous challenges. Traditional grasp detection methods generally analyze the entire scene to predict grasps, leading to redundancy and inefficiency. In this work, we reconsider 6-DoF grasp detection from a target-referenced perspective and propose a Target-Oriented Grasp Network (TOGNet). TOGNet specifically targets local, object-agnostic region patches to predict grasps more efficiently. It integrates seamlessly with multimodal human guidance, including language instructions, pointing gestures, and interactive clicks. Thus our system comprises two primary functional modules: a guidance module that identifies the target object in 3D space and TOGNet, which detects region-focal 6-DoF grasps around the target, facilitating subsequent motion planning. Through 50 target-grasping simulation experiments in cluttered scenes, our system achieves a success rate improvement of about 13.7%. In real-world experiments, we demonstrate that our method excels in various target-oriented grasping scenarios.

Pengwei Xie, Siang Chen, Dingchang Hu, Yixiang Dai, Kaiqin Yang, Guijin Wang• 2024

Related benchmarks

Task	Dataset	Result
Grasp Detection	GraspNet-1Billion RealSense Novel	AP23.74	33
Grasp Detection	GraspNet-1Billion RealSense Similar	AP0.4662	33
Grasp Detection	GraspNet-1Billion RealSense (Seen)	AP51.84	33
Grasp Detection	GraspNet-1Billion (RealSense)	AP (Average)40.73	32
Grasp Detection	GraspNet-1Billion Kinect camera (seen)	AP49.6	23
Grasp Detection	GraspNet-1Billion Kinect camera (Similar split)	AP40.03	13
Grasp Detection	GraspNet-1Billion Kinect camera (Novel)	AP19.58	13
Grasp Detection	GraspNet-1Billion Kinect	--	9

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord