FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

About

The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces stability issues in challenging samples upon this pipeline. These issues arise from two main factors. Firstly, the image preprocessing disables SAM from dynamically using image-level zoom-in strategies to refocus on the target object during interaction. Secondly, the lightweight decoder struggles to sufficiently integrate interactive information with image embeddings. To address these two limitations, we propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object. Dwin-MSA localizes attention computations around the target object, enhancing object-related embeddings with minimal computational overhead. Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks that have significant impacts on the overall segmentation results. Experimentally, FocSAM augments SAM's interactive segmentation performance to match the existing state-of-the-art method in segmentation quality, requiring only about 5.6% of this method's inference time on CPUs.

You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji• 2024

Related benchmarks

Task	Dataset	Result
Interactive Segmentation	Berkeley	NoC@901.47	235
Interactive Segmentation	GrabCut	NoC@901.32	225
Interactive Segmentation	DAVIS	NoC@904.77	202
Interactive Segmentation	SBD	NoC @ 90% Target4.69	171
Interactive Image Segmentation	GrabCut	NoC@901.44	28
Interactive Image Segmentation	DAVIS	NoC @ 90% IoU4.76	27
Interactive Segmentation	COD10K	NoC@908.91	18
Interactive Image Segmentation	SBD	NoC905.07	16
Interactive Segmentation	MVTec	NoC@9011.14	13
Interactive Image Segmentation	HQSeg-44K (val)	5-mIoU88.6	12

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord