LatentBKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

About

This paper introduces a novel probabilistic mapping algorithm, LatentBKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. LatentBKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). LatentBKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular Matterport3D and Semantic KITTI datasets, demonstrating that LatentBKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.

Joey Wilson, Ruihan Xu, Yile Sun, Parker Ewen, Minghan Zhu, Kira Barton, Maani Ghaffari• 2024

Related benchmarks

Task	Dataset	Result
Semantic Mapping	SemanticKITTI (scenes 6-10)	FPS0.53	6
Semantic Mapping	SceneNet Closed-set (12 full scenes)	FPS1.1	6
Semantic Localization	Office Map Slow walk (S-)	Success Rate (SR)98	5
Semantic Localization	Office Map Fast-walk condition (F-)	Success Rate74	5
Semantic Mapping	SceneNet (SN)	mIoU4.9	5
Geometric Mapping Accuracy	KITTI	Avg Chamfer Distance4	4
Semantic Mapping	KITTI	mIoU16.5	4
Geometric Mapping Accuracy	SceneNet (SN)	Avg Chamfer Distance1.189	4
Semantic Mapping	SceneNet Open-set (12 full scenes)	FPS1.67	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord