Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models

About

The emergence of Vision-Language Models (VLMs) has introduced new paradigms for global image geo-localization through retrieval-augmented generation (RAG) and reasoning-driven inference. However, RAG methods are constrained by retrieval database quality, while reasoning-driven approaches fail to internalize image locatability, relying on inefficient, fixed-depth reasoning paths that increase hallucinations and degrade accuracy. To overcome these limitations, we introduce an Optimized Locatability Score that quantifies an image's suitability for deep reasoning in geo-localization. Using this metric, we curate Geo-ADAPT-51K, a locatability-stratified reasoning dataset enriched with augmented reasoning trajectories for complex visual scenes. Building on this foundation, we propose a two-stage Group Relative Policy Optimization (GRPO) curriculum with customized reward functions that regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy. Our framework, Geo-ADAPT, learns an adaptive reasoning policy, achieves state-of-the-art performance across multiple geo-localization benchmarks, and substantially reduces hallucinations by reasoning both adaptively and efficiently.

Bo Yu, Fengze Yang, Yiming Liu, Chao Wang, Xuewen Luo, Taozhe Li, Ruimin Ke, Xiaofan Zhou, Chenxi Liu• 2026

Related benchmarks

TaskDatasetResultRank
Image GeolocalizationYFCC4k
Success Rate (1km)32.5
30
Image GeolocalizationIm2GPS3k
Success Rate @ 1 km17.9
26
City and country name predictionGeo-ADAPT-51K (test)
City Name Accuracy55.8
7
Showing 3 of 3 rows

Other info

Follow for update