MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
About
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios. Recently, state-space models have gained popularity in the visual domain due to their linear computational complexity. Despite their low FLOPs, current lightweight Mamba-based models exhibit suboptimal throughput. In this work, we propose the MobileMamba framework, which balances efficiency and performance. We design a three-stage network to enhance inference speed significantly. At a fine-grained level, we introduce the Multi-Receptive Field Feature Interaction(MRFFI) module, comprising the Long-Range Wavelet Transform-Enhanced Mamba(WTE-Mamba), Efficient Multi-Kernel Depthwise Convolution(MK-DeConv), and Eliminate Redundant Identity components. This module integrates multi-receptive field information and enhances high-frequency detail extraction. Additionally, we employ training and testing strategies to further improve performance and efficiency. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods which is maximum x21 faster than LocalVim on GPU. Extensive experiments on high-resolution downstream tasks demonstrate that MobileMamba surpasses current efficient models, achieving an optimal balance between speed and accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU42.5 | 2731 | |
| Object Detection | MS-COCO 2017 (val) | mAP29.5 | 237 | |
| Image Classification | ImageNet-1k 1.0 (test) | Top-1 Accuracy83.6 | 197 | |
| Image Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy82.5 | 15 | |
| Cell detection | CoNSeP (test) | mAP23.2 | 14 | |
| Cell detection | CytoDArk0 (test) | mAP44.1 | 14 | |
| Object Detection | CytoDArk0 | mAP@5074.9 | 14 |