Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping
About
Robotic grasping is a primitive skill for complex tasks and is fundamental to intelligence. For general 6-Dof grasping, most previous methods directly extract scene-level semantic or geometric information, while few of them consider the suitability for various downstream applications, such as target-oriented grasping. Addressing this issue, we rethink 6-Dof grasp detection from a grasp-centric view and propose a versatile grasp framework capable of handling both scene-level and target-oriented grasping. Our framework, FlexLoG, is composed of a Flexible Guidance Module and a Local Grasp Model. Specifically, the Flexible Guidance Module is compatible with both global (e.g., grasp heatmap) and local (e.g., visual grounding) guidance, enabling the generation of high-quality grasps across various tasks. The Local Grasp Model focuses on object-agnostic regional points and predicts grasps locally and intently. Experiment results reveal that our framework achieves over 18% and 23% improvement on unseen splits of the GraspNet-1Billion Dataset. Furthermore, real-world robotic tests in three distinct settings yield a 95% success rate.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Grasp Detection | GraspNet-1Billion (RealSense) | AP (Average)56.02 | 32 | |
| Grasp Detection | GraspNet-1Billion RealSense (Seen) | AP50.57 | 25 | |
| Grasp Detection | GraspNet-1Billion RealSense Similar | AP0.4459 | 25 | |
| Grasp Detection | GraspNet-1Billion RealSense Novel | AP22.59 | 25 | |
| Grasp Detection | GraspNet-1Billion Kinect camera (seen) | AP44.67 | 23 | |
| Grasp Detection | GraspNet-1Billion Kinect camera (Similar split) | AP39.37 | 13 | |
| Grasp Detection | GraspNet-1Billion Kinect camera (Novel) | AP16.04 | 13 | |
| Grasp Pose Detection | GraspNet-1Billion Kinect 1.0 | AP (Seen)69.44 | 12 | |
| Grasp Detection | GraspNet-1Billion Kinect | AP (Seen)0.6944 | 9 |