Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

About

Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, large-scale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting language-grounded robotic navigation. In this work, we present HOV-SG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with open-vocabulary features. Our approach is able to represent multi-story buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within real-world multi-storage environments. We provide code and trial video data at http://hovsg.github.io/.

Abdelrhman Werby, Chenguang Huang, Martin B\"uchner, Abhinav Valada, Wolfram Burgard• 2024

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet (test)
mIoU20.76
105
3D Semantic MappingReplica
mAcc39.59
25
3D Semantic SegmentationScanNet 3 (val)
mIoU34.4
11
3D Semantic SegmentationScanNet200 42 (val)
mIoU11.2
9
Spatial Question Response (Object Retrieval)HM3DSem-SQR
Accuracy (1m, ABC)27
7
Hierarchical Task AnalysisSG3D 8 scenes HM3DSem (test)
Success Recall4.55
6
3D Semantic SegmentationReplica (out-of-distribution)
mIoU0.144
5
Object RetrievalOpenLex3D Replica
mAP5.76
5
Object RetrievalOpenLex3D HM3D
mAP3.44
5
Sequence ProcessingReplica monocular sequences Habitat-Sim rendered
Mean time per sequence1.11e+3
5
Showing 10 of 13 rows

Other info

Follow for update