Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Where am I? Cross-View Geo-localization with Natural Language Descriptions

About

Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response. In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. To support this task, we construct the CVG-Text dataset by collecting cross-view data from multiple cities and employing a scene text generation approach that leverages the annotation capabilities of Large Multimodal Models to produce high-quality scene text descriptions with localization details. Additionally, we propose a novel text-based retrieval localization method, CrossText2Loc, which improves recall by 10% and demonstrates excellent long-text retrieval capabilities. In terms of explainability, it not only provides similarity scores but also offers retrieval reasons. More information can be found at https://yejy53.github.io/CVG-Text/ .

Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li• 2024

Related benchmarks

TaskDatasetResultRank
Cross-modal Geo-localizationCVG-Text (New York)
R@159.08
29
Cross-modal Geo-localizationCVG-Text (Brisbane)
Recall@147.58
15
Cross-modal Geo-localizationCVG-Text Tokyo
Recall@141.75
15
Cross-modal Geo-localizationCORE World-level 1.0 (All)
R@151.92
15
Cross-modal Geo-localizationCORE Intercontinental-level Subset1 1.0
R@153.12
15
Cross-modal Geo-localizationCORE Intercontinental-level Subset3 1.0
R@146.97
15
Cross-modal Geo-localizationCORE Intercontinental-level Subset4 1.0
R@148.71
15
Cross-modal Geo-localizationCORE Intercontinental-level Subset2 1.0
R@159.36
15
Text-to-Satellite Image RetrievalCVG-Text (Brisbane)
R@146.08
14
Text-to-Satellite Image RetrievalCVG-Text Tokyo
R@136.83
14
Showing 10 of 24 rows

Other info

Follow for update