Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Open-Vocabulary Online Semantic Mapping for SLAM

About

This paper presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure.

Tomas Berriel Martins, Martin R. Oswald, Javier Civera• 2024

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet (test)
mIoU31.58
109
3D Semantic MappingReplica
mAcc43.6
25
3D Semantic SegmentationReplica (test)
mIoU (All)27.1
10
Open-set SegmentationReplica
mIoU22.3
8
Open-Vocabulary 3D Semantic SegmentationReplica (test)
All IoU33
7
Sequence ProcessingReplica monocular sequences Habitat-Sim rendered
Mean time per sequence32
5
3D Semantic SegmentationScanNet200
mIoU16
5
3D Semantic SegmentationReplica 3D
mIoU27.1
5
3D Semantic SegmentationScanNet20
mIoU31.8
5
Trajectory EstimationReplica 3D
ATE RMSE1.9
3
Showing 10 of 10 rows

Other info

Follow for update