Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

About

Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.

Shixuan Gao, Pingping Zhang, Tianyu Yan, Huchuan Lu• 2024

Related benchmarks

Task	Dataset	Result
Camouflaged Object Detection	COD10K	S-measure (S_alpha)0.846	217
Polyp Segmentation	ETIS (test)	Mean Dice75.3	94
Camouflaged Object Detection	NC4K	MAE0.042	72
Camouflaged Object Detection	Chameleon	MAE0.027	22
Camouflaged Object Detection	CAMO	MAE6.1	22
Saliency Detection	CSOD10K	MAE0.042	17
RGB-D Video Salient Object Detection	DVisal	S_alpha79.6	14
RGB-D Video Salient Object Detection	ViDSOD-100	S_alpha87.7	14
RGB-D Video Salient Object Detection	RDVS	S_alpha79.1	14
Salient Object Detection	DUTLF S.Aperture	MAE0.039	13

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord