FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation
About
We address the challenging task of 3D object segmentation in complex scene point clouds without relying on any scene-level human annotations during training. Existing methods are typically constrained to identifying simple objects, primarily due to insufficient object priors in the learning process. In this paper, we present FoundObj, a novel framework featuring a superpoint-based object discovery agent that incrementally merges suitable neighboring superpoints, guided by our innovative semantic and geometric reward modules. These modules synergistically leverage semantic and geometric priors from self-supervised 2D/3D foundation models, providing complementary feedback to the object discovery agent and enabling robust identification of multi-class objects through reinforcement learning. Extensive experiments on diverse benchmarks demonstrate that our approach consistently outperforms existing baselines. Notably, our method exhibits strong generalization in zero-shot and long-tail scenarios, underscoring its potential for scalable, label-free 3D object segmentation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Instance Segmentation | S3DIS (Area 5) | mAP@50% IoU24 | 120 | |
| 3D Instance Segmentation | S3DIS (6-fold CV) | -- | 92 | |
| 3D object segmentation | ScanNet 2017 (val) | AP24.2 | 11 | |
| 3D object segmentation | ScanNet200 (val) | AP18.1 | 8 | |
| 3D object segmentation | S3DIS Area1 | AP11.9 | 7 | |
| 3D object segmentation | S3DIS (Area 6) | AP13.5 | 7 | |
| 3D object segmentation | S3DIS (Area3) | AP12.6 | 7 | |
| 3D object segmentation | S3DIS (Area4) | AP12.2 | 7 | |
| 3D object segmentation | S3DIS (Area 5) | AP12.8 | 7 | |
| 3D object segmentation | S3DIS (Area2) | AP5.4 | 7 |