Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images

About

Effectively grounding complex language to pixels in remote sensing (RS) images is a critical challenge for applications like disaster response and environmental monitoring. Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios, e.g., segmenting objects at various granularities, executing multi-target instructions, and interpreting implicit user intent. To drive progress against these failures, we present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation across four critical dimensions of language-guided segmentation: hierarchical granularity, target multiplicity, reasoning requirements, and linguistic variability. By capturing these dimensions, LaSeRS moves beyond simple commands, providing a benchmark for complex geospatial reasoning. This addresses a critical gap: existing datasets oversimplify, leading to sensitivity-prone real-world models. We also propose SegEarth-R2, an MLLM architecture designed for comprehensive language-guided segmentation in RS, which directly confronts these challenges. The model's effectiveness stems from two key improvements: (1) a spatial attention supervision mechanism specifically handles the localization of small objects and their components, and (2) a flexible and efficient segmentation query mechanism that handles both single-target and multi-target scenarios. Experimental results demonstrate that our SegEarth-R2 achieves outstanding performance on LaSeRS and other benchmarks, establishing a powerful baseline for the next generation of geospatial segmentation. All data and code will be released at https://github.com/earth-insights/SegEarth-R2.

Zepeng Xin, Kaiyu Li, Luodi Chen, Wanchen Li, Yuchen Xiao, Hui Qiao, Weizhan Zhang, Deyu Meng, Xiangyong Cao• 2025

Related benchmarks

TaskDatasetResultRank
Reasoning SegmentationEarthReason (val)
gIoU72.3
47
Referring Remote Sensing Image SegmentationRRSIS-D (test)--
36
Referring SegmentationRISBench (test)
gIoU70.5
31
Reasoning SegmentationEarthReason (test)
gIoU73.5
28
Referring Remote Sensing Image SegmentationRRSIS-D (val)--
28
Referring Expression SegmentationRRSIS-D
mIoU67.9
27
Referring Expression SegmentationLaSeRS (test)
gIoU (Semantic)60.2
8
Referring SegmentationRefSegRS (val)
gIoU84.4
6
Referring SegmentationRefSegRS (test)
gIoU74.8
6
Referring SegmentationRISBench (val)
gIoU69.8
5
Showing 10 of 10 rows

Other info

Follow for update