Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

About

The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces stability issues in challenging samples upon this pipeline. These issues arise from two main factors. Firstly, the image preprocessing disables SAM from dynamically using image-level zoom-in strategies to refocus on the target object during interaction. Secondly, the lightweight decoder struggles to sufficiently integrate interactive information with image embeddings. To address these two limitations, we propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object. Dwin-MSA localizes attention computations around the target object, enhancing object-related embeddings with minimal computational overhead. Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks that have significant impacts on the overall segmentation results. Experimentally, FocSAM augments SAM's interactive segmentation performance to match the existing state-of-the-art method in segmentation quality, requiring only about 5.6% of this method's inference time on CPUs.

You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji• 2024

Related benchmarks

TaskDatasetResultRank
Interactive SegmentationBerkeley
NoC@901.47
230
Interactive SegmentationGrabCut
NoC@901.32
225
Interactive SegmentationDAVIS
NoC@904.77
197
Interactive SegmentationSBD
NoC @ 90% Target4.69
171
Interactive Image SegmentationGrabCut
NoC@901.44
28
Interactive Image SegmentationDAVIS
NoC @ 90% IoU4.76
27
Interactive Image SegmentationSBD
NoC905.07
16
Interactive SegmentationCOD10K
NoC@908.91
13
Interactive SegmentationMVTec
NoC@9011.14
13
Interactive Image SegmentationHQSeg-44K (val)
5-mIoU88.6
12
Showing 10 of 10 rows

Other info

Code

Follow for update