Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DefMamba: Deformable Visual State Space Model

About

Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods flatten images into 1D sequences using predefined scan orders, which results the model being less capable of utilizing the spatial structural information of the image during the feature extraction process. To address this issue, we proposed a novel visual foundation model called DefMamba. This model includes a multi-scale backbone structure and deformable mamba (DM) blocks, which dynamically adjust the scanning path to prioritize important information, thus enhancing the capture and processing of relevant input features. By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is open source on DefMamba.

Leiye Liu, Miao Zhang, Jihao Yin, Tingwei Liu, Wei Ji, Yongri Piao, Huchuan Lu• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)--
3069
Instance SegmentationCOCO 2017 (val)
APm0.428
1275
Semantic segmentationADE20K--
1028
Object DetectionCOCO
AP50 (Box)70.5
237
Object DetectionCOCO mini (val)
AP47.5
132
Instance SegmentationMS-COCO
mAP Mask42.8
111
Instance SegmentationCOCO mini (val)--
72
Object DetectionMS-COCO
APb47.5
51
Object DetectionMSCOCO 2017 (val)
APb47.5
33
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Acc83.5
28
Showing 10 of 10 rows

Other info

Follow for update