Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

About

Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder achieves inter-level cross-modality fusion under the guidance of edge features. The proposed model outperforms the state-of-the-art models on RGB-D and RGB-T datasets, showing that it provides more insight into the cross-modality complementarity task.

Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao• 2022

Related benchmarks

TaskDatasetResultRank
RGB-D Salient Object DetectionSTERE
S-measure (Sα)0.919
198
RGB-D Salient Object DetectionSIP
S-measure (Sα)0.911
124
RGB-D Saliency DetectionNLPR
Max F-beta0.936
65
RGB-D Salient Object DetectionNJUD
S-measure92
54
Salient Object DetectionVT5000
S-Measure0.912
50
Salient Object DetectionVT821
S-Measure0.904
36
Salient Object DetectionVT1000
Fm (F-measure)0.947
19
Salient Object DetectionUVT 2000
Fm57.9
18
Salient Object Detectionun-VT5000
Fm82.3
18
Salient Object Detectionun-VT1000
Fm89
18
Showing 10 of 19 rows

Other info

Follow for update