GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
About
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions. Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies, which conflict with geographic characteristics. To address these issues, we first introduce GeoSeek, a new geolocation dataset comprising CoT data annotated by geographic experts and professional players. We further thoroughly explore the inherent characteristics of geographic tasks and propose a geo-similarity reward and a consistency reward assessed by a consistency agent to assist training. This encourages the model to converge towards correct answers from a geographic perspective while ensuring the integrity and consistency of its reasoning process. Experimental results show that GeoAgent outperforms existing methods and a series of general VLLMs across multiple grains, while generating reasoning that closely aligns with humans.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Geolocalization | IM2GPS3K (test) | Success Rate (25km)40.75 | 93 | |
| Geolocation | GeoSeek (val) | Success Rate (City 25km)15.69 | 9 |