Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

About

Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: indirect verifiable rewards, derived from seemingly unrelated metadata, are sufficient to induce sophisticated and generalizable geospatial reasoning across a wide range of downstream tasks (25+). We present Geo-R1 as one empirical instantiation of this paradigm. Rather than relying on limited task-specific annotations (i.e., direct rewards), Geo-R1 utilizes scalable, verifiable indirect proxy rewards based on cross-view alignment with metadata (geolocation information) to drive reinforcement learning at scale. Such indirect rewards successfully motivate the model to discover and internalize zero-shot geospatial reasoning across diverse tasks, achieving extraordinary zero-shot transfer on out-of-distribution benchmarks and even surpassing fully supervised specialists on certain benchmarks. These findings indicate that optimizing for indirect verifiable rewards may provide a scalable pathway to unlock generalized reasoning capabilities in rare domains with massive unlabeled data archives. Our code is availavle at: https://github.com/miniHuiHui/Geo-R1.

Chenhui Xu, Fuxun Yu, Michael J. Bianco, Jacob Kovarskiy, Raphael Tang, Qi Zhang, Zirui Xu, Will LeVine, Brandon Dubbs, Heming Liao, Cassandra Burgess, Suvam Bag, Jay Patravali, Rupanjali Kukal, Mikael Figueroa, Rishi Madhok, Nikolaos Karianakis, Jinjun Xiong• 2025

Related benchmarks

TaskDatasetResultRank
Image GeolocalizationIM2GPS3K (test)
Success Rate (25km)41.3
159
Visual GroundingDIOR-RSVG
Accuracy@0.517.67
34
Referring Expression ComprehensionVRSBench (test)
Accuracy@0.549.6
16
Open-Vocabulary DetectionNWPU VHR-10 (val)
mAP (IoU=0.5:0.95)18.87
13
Geographic LocalizationIMAGEO-GSS (test)
City Accuracy32.7264
10
Visual Question AnsweringVRSBench
Avg@557
10
Visual GroundingVRSBench Ref
IoU@5017.18
10
Visual Question AnsweringRSFG-SC
Scene Accuracy52.46
10
Visual Question AnsweringRSFG-VQA
Avg@50.4503
10
Visual Question AnsweringRSVQA
Avg@534.5
10
Showing 10 of 17 rows

Other info

Follow for update