Where am I? Cross-View Geo-localization with Natural Language Descriptions

About

Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response. In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. To support this task, we construct the CVG-Text dataset by collecting cross-view data from multiple cities and employing a scene text generation approach that leverages the annotation capabilities of Large Multimodal Models to produce high-quality scene text descriptions with localization details. Additionally, we propose a novel text-based retrieval localization method, CrossText2Loc, which improves recall by 10% and demonstrates excellent long-text retrieval capabilities. In terms of explainability, it not only provides similarity scores but also offers retrieval reasons. More information can be found at https://yejy53.github.io/CVG-Text/ .

Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li• 2024

Related benchmarks

Task	Dataset	Result
Cross-modal Geo-localization	CVG-Text (New York)	R@162.33	54
Cross-modal Geo-localization	CVG-Text (Brisbane)	Recall@147.58	28
Cross-modal Geo-localization	CORE Intercontinental-level Subset1 1.0	R@153.12	28
Cross-modal Geo-localization	CORE Intercontinental-level Subset3 1.0	R@146.97	28
Text-to-Satellite Image Retrieval	CVG-Text Tokyo	R@141.75	27
Text-to-Satellite Image Retrieval	CVG-Text (Brisbane)	R@148.75	26
Cross-modal Geo-localization	CVG-Text Tokyo	Recall@141.75	15
Cross-modal Geo-localization	CORE World-level 1.0 (All)	R@151.92	15
Cross-modal Geo-localization	CORE Intercontinental-level Subset4 1.0	R@148.71	15
Cross-modal Geo-localization	CORE Intercontinental-level Subset2 1.0	R@159.36	15

Showing 10 of 28 rows

Other info

Follow for update

@wizwand_team Discord