GeoSeg: Training-Free Reasoning-Driven Segmentation in Remote Sensing Imagery
About
Recent advances in MLLMs are reframing segmentation from fixed-category prediction to instruction-grounded localization. While reasoning based segmentation has progressed rapidly in natural scenes, remote sensing lacks a generalizable solution due to the prohibitive cost of reasoning-oriented data and domain-specific challenges like overhead viewpoints. We present GeoSeg, a zero-shot, training-free framework that bypasses the supervision bottleneck for reasoning-driven remote sensing segmentation. GeoSeg couples MLLM reasoning with precise localization via: (i) bias-aware coordinate refinement to correct systematic grounding shifts and (ii) a dual-route prompting mechanism to fuse semantic intent with fine-grained spatial cues. We also introduce GeoSeg-Bench, a diagnostic benchmark of 810 image--query pairs with hierarchical difficulty levels. Experiments show that GeoSeg consistently outperforms all baselines, with extensive ablations confirming the effectiveness and necessity of each component.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning Segmentation | GeoSeg-Bench | IoU56.4 | 14 | |
| Reasoning-driven segmentation | GeoSeg-Bench | Faithfulness (Qwen-8B)3.64 | 14 | |
| Reasoning Segmentation | SegEarth-R2 (train) | Mean IoU17.4 | 14 | |
| Reasoning-driven segmentation | SegEarth-R2 (train) | Faithfulness (Qwen-8B)1.78 | 14 |