Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow

About

Remote sensing imagery presents vast, inherently unstructured spatial data, necessitating sophisticated reasoning to interpret complex user intents and contextual relationships beyond simple recognition tasks. In this paper, we aim to construct an Earth observation workflow to handle complex queries by reasoning about spatial context and user intent. As a reasoning workflow, it should autonomously explore and construct its own inference paths, rather than being confined to predefined ground-truth sequences. Ideally, its architecture ought to be unified yet generalized, possessing capabilities to perform diverse reasoning tasks through one model without requiring additional fine-tuning. Existing remote sensing approaches rely on supervised fine-tuning paradigms and task-specific heads, limiting both autonomous reasoning and unified generalization. To this end, we propose RemoteReasoner, a unified workflow for geospatial reasoning. The design of RemoteReasoner integrates a multi-modal large language model (MLLM) for interpreting user instructions and localizing targets, together with task transformation strategies that enable multi-granularity tasks, including object-, region-, and pixel-level. In contrast to existing methods, our framework is trained with reinforcement learning (RL) to endow the MLLM sufficient reasoning autonomy. At the inference stage, our transformation strategies enable diverse task output formats without requiring task-specific decoders or further fine-tuning. Experiments demonstrated that RemoteReasoner achieves state-of-the-art (SOTA) performance across multi-granularity reasoning tasks. Furthermore, it retains the MLLM's inherent generalization capability, demonstrating robust performance on unseen tasks and out-of-distribution categories.

Liang Yao, Fan Liu, Hongbo Lu, Chuanyi Zhang, Rui Min, Shengxiang Xu, Shimin Di, Pai Peng• 2025

Related benchmarks

TaskDatasetResultRank
Reasoning SegmentationEarthReason (val)
gIoU69.02
15
Reasoning SegmentationEarthReason (test)
gIoU71
15
Socio-class SegmentationSocioSeg (test)
cIoU42.9
10
Socio-function SegmentationSocioSeg (test)
cIoU38
10
Socio-name SegmentationSocioSeg (test)
cIoU46.6
10
Socio-semantic SegmentationSocioSeg (test)
cIoU43.2
10
Socio-semantic SegmentationSocioSeg OOD (New Region)
cIoU0.275
10
Referring Expression SegmentationRRSIS-D (test)
gIoU50.97
8
Showing 8 of 8 rows

Other info

Follow for update