Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Text2Loc: 3D Point Cloud Localization from Natural Language

About

We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics among each textual hint are captured in a hierarchical transformer with max-pooling (HTM), whereas a balance between positive and negative pairs is maintained using text-submap contrastive learning. Moreover, we propose a novel matching-free fine localization method to further refine the location predictions, which completely removes the need for complicated text-instance matching and is lighter, faster, and more accurate than previous methods. Extensive experiments show that Text2Loc improves the localization accuracy by up to $2\times$ over the state-of-the-art on the KITTI360Pose dataset. Our project page is publicly available at \url{https://yan-xia.github.io/projects/text2loc/}.

Yan Xia, Letian Shi, Zifeng Ding, Jo\~ao F. Henriques, Daniel Cremers• 2023

Related benchmarks

TaskDatasetResultRank
Global Place RecognitionKITTI360Pose (val)
Recall@10.32
15
Text-based position localizationKITTI360 Pose (test)
Localization Recall (k=1, ε < 5m)33
13
LocalizationKITTI360Pose (val)
Recall @ 5m77
12
LocalizationKITTI360Pose (test)
Recall @ 5m71
12
Text-to-point cloud localizationKITTI360 Pose (val)
Recall@k=1 (5m)37
11
Fine LocalizationKITTI360Pose (val)
Recall@k=1 (5m Error)53
10
Fine LocalizationKITTI360Pose (test)
Recall@1 (5m)0.47
10
Text-to-point-cloud-submap retrievalKITTI360Pose (test)
Recall@10.28
8
Global Place RecognitionKITTI360 Pose (test)
Recall@128
5
Showing 9 of 9 rows

Other info

Follow for update