Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision

About

We study the hard problem of 3D object segmentation in complex point clouds without requiring human labels of 3D scenes for supervision. By relying on the similarity of pretrained 2D features or external signals such as motion to group 3D points as objects, existing unsupervised methods are usually limited to identifying simple objects like cars or their segmented objects are often inferior due to the lack of objectness in pretrained features. In this paper, we propose a new two-stage pipeline called GrabS. The core concept of our method is to learn generative and discriminative object-centric priors as a foundation from object datasets in the first stage, and then design an embodied agent to learn to discover multiple objects by querying against the pretrained generative priors in the second stage. We extensively evaluate our method on two real-world datasets and a newly created synthetic dataset, demonstrating remarkable segmentation performance, clearly surpassing all existing unsupervised methods.

Zihui Zhang, Yafei Yang, Hongtao Wen, Bo Yang• 2025

Related benchmarks

TaskDatasetResultRank
3D Instance SegmentationS3DIS (Area 5)
mAP@50% IoU66.2
120
3D Instance SegmentationS3DIS (6-fold CV)--
92
3D object segmentationScanNet 2017 (val)
AP14
11
3D Instance Segmentationsynthetic multi-class dataset (test)
AP59.5
8
Class-agnostic 3D object segmentationScanNet 8 (val)
AP47.1
8
3D Instance SegmentationScanNet (hidden test)
AP29
8
3D object segmentationScanNet200 (val)
AP7.5
8
3D Instance SegmentationScanNet++ (val)
mAP19.8
7
3D object segmentationS3DIS (Area 6)
AP4.3
7
3D object segmentationS3DIS (Area3)
AP4.8
7
Showing 10 of 14 rows

Other info

Follow for update