Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework

About

Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Transformers. Recently, the emerging state-space model, namely Mamba, has shown great potential in balancing global receptive fields and computational efficiency. As a solution, we propose Saliency Mamba (Samba), a pure Mamba-based architecture that flexibly handles various distinct SOD tasks, including RGB/RGB-D/RGB-T SOD, video SOD (VSOD), RGB-D VSOD, and visible-depth-thermal SOD. Specifically, we rethink the scanning strategy of Mamba for SOD, and introduce a saliency-guided Mamba block (SGMB) that features a spatial neighborhood scanning (SNS) algorithm to preserve the spatial continuity of salient regions. A context-aware upsampling (CAU) method is also proposed to promote hierarchical feature alignment and aggregation by modeling contextual dependencies. As one step further, to avoid the "task-specific" problem as in previous SOD solutions, we develop Samba+, which is empowered by training Samba in a multi-task joint manner, leading to a more unified and versatile model. Two crucial components that collaboratively tackle challenges encountered in input of arbitrary modalities and continual adaptation are investigated. Specifically, a hub-and-spoke graph attention (HGA) module facilitates adaptive cross-modal interactive fusion, and a modality-anchored continual learning (MACL) strategy alleviates inter-modal conflicts together with catastrophic forgetting. Extensive experiments demonstrate that Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost, whereas Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model. Additional results further demonstrate the potential of our Samba framework.

Wenzhuo Zhao, Keren Fu, Jiahao He, Xiaohong Liu, Qijun Zhao, Guangtao Zhai• 2026

Related benchmarks

TaskDatasetResultRank
RGB-D Salient Object DetectionSTERE
S-measure (Sα)0.937
198
Salient Object DetectionPASCAL-S--
186
RGB-D Salient Object DetectionSIP
S-measure (Sα)0.948
124
Skin Lesion SegmentationISIC 2017 (test)
Dice Score90.65
100
Camouflaged Object DetectionChameleon
S-measure (S_alpha)92
96
Camouflaged Object DetectionCOD10K
S-measure (S_alpha)0.886
83
Skin Lesion SegmentationISIC 2018 (test)
Dice Score90.05
74
RGB-D Saliency DetectionNLPR
Max F-beta0.944
65
RGB-D Salient Object DetectionNJUD
S-measure95
54
Video Salient Object DetectionFBMS (test)
F-score92.2
30
Showing 10 of 27 rows

Other info

Follow for update