Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

About

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusion

Kashu Yamazaki, Taisei Hanyu, Khoa Vo, Thang Pham, Minh Tran, Gianfranco Doretto, Anh Nguyen, Ngan Le• 2023

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet
mIoU8.3
51
3D Semantic SegmentationReplica
3D mIoU14.9
41
3D Open-set Semantic SegmentationScanNet 8 scenes
mAcc67
7
3D Open-set Semantic SegmentationReplica 8 standard scenes
mAcc41
6
Text-based Object RetrievalSr3D
Acc@0.113
5
3D Object GroundingNr3D
Overall Accuracy (IoU=0.10)10.7
5
Showing 6 of 6 rows

Other info

Follow for update