Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

About

Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise of large vision-language models (LVLMs) has enabled a rethinking of geo-localization as a reasoning-driven task grounded in visual cues. However, two major challenges persist. On the data side, existing reasoning-focused datasets are primarily based on street-view imagery, offering limited scene diversity and constrained viewpoints. On the modeling side, current approaches predominantly rely on supervised fine-tuning, which yields only marginal improvements in reasoning capabilities. To address these challenges, we propose a novel pipeline that constructs a reasoning-oriented geo-localization dataset, MP16-Reason, using diverse social media images. We introduce GLOBE, Group-relative policy optimization for Localizability assessment and Optimized visual-cue reasoning, yielding Bi-objective geo-Enhancement for the VLM in recognition and reasoning. GLOBE incorporates task-specific rewards that jointly enhance localizability assessment, visual-cue reasoning, and geolocation accuracy. Both qualitative and quantitative results demonstrate that GLOBE outperforms state-of-the-art open-source LVLMs on geo-localization tasks, particularly in diverse visual scenes, while also generating more insightful and interpretable reasoning trajectories. The data and code are available at https://github.com/lingli1996/GLOBE.

Ling Li, Yao Zhou, Yuxuan Liang, Fugee Tsung, Jiaheng Wei• 2025

Related benchmarks

Task	Dataset	Result
Image Geolocalization	IM2GPS3K (test)	Success Rate (25km)40.18	159
Image Geolocalization	Im2GPS3k	Success Rate @ 1 km9.84	43
Geolocalization	MAPBench 1.0 (test-hard)	Acc@500m0.05	11
Geolocalization	MAPBench easy 1.0 (test)	Acc@500m0.17	11
Geolocation	AVG (test)	City Acc (25km)6.8	10
Image Geolocalization	MP16-Reason	Street 1km Success Rate17.99	9
Geolocation	GeoSeek (val)	Success Rate (City 25km)10.75	9
Image Geolocation	CCL-Bench	City ACC26.33	8
Image Geolocation	CCL-Bench	Accuracy @ 1km3.67	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord