RSRefSeg: Referring Remote Sensing Image Segmentation with Foundation Models

About

Referring remote sensing image segmentation is crucial for achieving fine-grained visual understanding through free-format textual input, enabling enhanced scene and object extraction in remote sensing applications. Current research primarily utilizes pre-trained language models to encode textual descriptions and align them with visual modalities, thereby facilitating the expression of relevant visual features. However, these approaches often struggle to establish robust alignments between fine-grained semantic concepts, leading to inconsistent representations across textual and visual information. To address these limitations, we introduce a referring remote sensing image segmentation foundational model, RSRefSeg. RSRefSeg leverages CLIP for visual and textual encoding, employing both global and local textual semantics as filters to generate referring-related visual activation features in the latent space. These activated features then serve as input prompts for SAM, which refines the segmentation masks through its robust visual generalization capabilities. Experimental results on the RRSIS-D dataset demonstrate that RSRefSeg outperforms existing methods, underscoring the effectiveness of foundational models in enhancing multimodal task comprehension. The code is available at \url{https://github.com/KyanChen/RSRefSeg}.

Keyan Chen, Jiafan Zhang, Chenyang Liu, Zhengxia Zou, Zhenwei Shi• 2025

Related benchmarks

Task	Dataset	Result
Referring Remote Sensing Image Segmentation	RRSIS-D (test)	Precision @ IoU 0.574.49	36
Referring Segmentation	MTRefSeg-21K v1 (val)	mIoU64.99	25
Referring Temporal Change Segmentation	MTRefSeg-21K v1	mIoU59.08	25
Change Referring Segmentation	MTRefSeg-21K NS (val)	mIoU50.02	25
Socio-class Segmentation	SocioSeg (test)	cIoU30.7	10
Socio-function Segmentation	SocioSeg (test)	cIoU28.7	10
Socio-name Segmentation	SocioSeg (test)	cIoU27.1	10
Socio-semantic Segmentation	SocioSeg (test)	cIoU29	10
Socio-semantic Segmentation	SocioSeg OOD (New Region)	cIoU0.124	10

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord