Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

About

Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Network (GeleNet) for ORSI-SOD following the global-to-local paradigm. Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Then, GeleNet employs a Direction-aware Shuffle Weighted Spatial Attention Module (D-SWSAM) and its simplified version (SWSAM) to enhance local interactions, and a Knowledge Transfer Module (KTM) to further enhance cross-level contextual interactions. D-SWSAM comprehensively perceives the orientation information in the lowest-level features through directional convolutions to adapt to various orientations of salient objects in ORSIs, and effectively enhances the details of salient objects with an improved attention mechanism. SWSAM discards the direction-aware part of D-SWSAM to focus on localizing salient objects in the highest-level features. KTM models the contextual correlation knowledge of two middle-level features of different scales based on the self-attention mechanism, and transfers the knowledge to the raw features to generate more discriminative features. Finally, a saliency predictor is used to generate the saliency map based on the outputs of the above three modules. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods. The code and results of our method are available at https://github.com/MathLee/GeleNet.

Gongyang Li, Zhen Bai, Zhi Liu, Xinpeng Zhang, Haibin Ling• 2023

Related benchmarks

Task	Dataset	Result
Salient Object Detection	ORSSD	P(F_beta)2.6757	41
Salient Object Detection	EORSSD	P(F_beta)1.5702	41
Salient Object Detection	ORSI-4199	P(M)6.7808	29
Object Segmentation	ORSSD 200 images	mIoU85.75	22
Object Segmentation	EORSSD 600 images	mIoU80.17	22
Salient Object Detection	EORSSD (test)	Mean Error (M)0.0111	19
Salient Object Detection	ORSI-4199 (test)	M Error0.042	19
Salient Object Detection	ORSSD (test)	Mean Error (M)0.0227	19
Object Segmentation	ORSIs-4199 (2199 images)	mIoU78.85	18
Saliency Object Detection	ORSI-4199	Sα0.873	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord