Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework

About

Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Transformers. Recently, the emerging state-space model, namely Mamba, has shown great potential in balancing global receptive fields and computational efficiency. As a solution, we propose Saliency Mamba (Samba), a pure Mamba-based architecture that flexibly handles various distinct SOD tasks, including RGB/RGB-D/RGB-T SOD, video SOD (VSOD), RGB-D VSOD, and visible-depth-thermal SOD. Specifically, we rethink the scanning strategy of Mamba for SOD, and introduce a saliency-guided Mamba block (SGMB) that features a spatial neighborhood scanning (SNS) algorithm to preserve the spatial continuity of salient regions. A context-aware upsampling (CAU) method is also proposed to promote hierarchical feature alignment and aggregation by modeling contextual dependencies. As one step further, to avoid the "task-specific" problem as in previous SOD solutions, we develop Samba+, which is empowered by training Samba in a multi-task joint manner, leading to a more unified and versatile model. Two crucial components that collaboratively tackle challenges encountered in input of arbitrary modalities and continual adaptation are investigated. Specifically, a hub-and-spoke graph attention (HGA) module facilitates adaptive cross-modal interactive fusion, and a modality-anchored continual learning (MACL) strategy alleviates inter-modal conflicts together with catastrophic forgetting. Extensive experiments demonstrate that Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost, whereas Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model. Additional results further demonstrate the potential of our Samba framework.

Wenzhuo Zhao, Keren Fu, Jiahao He, Xiaohong Liu, Qijun Zhao, Guangtao Zhai• 2026

Related benchmarks

Task	Dataset	Result
RGB-D Salient Object Detection	STERE	S-measure (Sα)0.937	232
Camouflaged Object Detection	COD10K	S-measure (S_alpha)0.886	217
Camouflaged Object Detection	Chameleon	S-measure (S_alpha)92	207
Salient Object Detection	PASCAL-S	--	196
Skin Lesion Segmentation	ISIC 2018 (test)	Dice Score90.05	143
RGB-D Salient Object Detection	SIP	S-measure (Sα)0.948	134
Skin Lesion Segmentation	ISIC 2017 (test)	Dice Score90.65	134
Camouflaged Object Detection	NC4K	M Score0.029	88
RGB-D Saliency Detection	NLPR	Max F-beta0.944	78
RGB-D Salient Object Detection	NJUD	F-measure95.6	78

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord