SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

About

High-quality semantic segmentation relies on three key capabilities: global context modeling, local detail encoding, and multi-scale feature extraction. However, recent methods struggle to possess all these capabilities simultaneously. Hence, we aim to empower segmentation networks to simultaneously carry out efficient global context modeling, high-quality local detail encoding, and rich multi-scale feature representation for varying input resolutions. In this paper, we introduce SegMAN, a novel linear-time model comprising a hybrid feature encoder dubbed SegMAN Encoder, and a decoder based on state space models. Specifically, the SegMAN Encoder synergistically integrates sliding local attention with dynamic state space models, enabling highly efficient global context modeling while preserving fine-grained local details. Meanwhile, the MMSCopE module in our decoder enhances multi-scale context feature extraction and adaptively scales with the input resolution. Our SegMAN-B Encoder achieves 85.1% ImageNet-1k accuracy (+1.5% over VMamba-S with fewer parameters). When paired with our decoder, the full SegMAN-B model achieves 52.6% mIoU on ADE20K (+1.6% over SegNeXt-L with 15% fewer GFLOPs), 83.8% mIoU on Cityscapes (+2.1% over SegFormer-B3 with half the GFLOPs), and 1.6% higher mIoU than VWFormer-B3 on COCO-Stuff with lower GFLOPs. Our code is available at https://github.com/yunxiangfu2001/SegMAN.

Yunxiang Fu, Meng Lou, Yizhou Yu• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	--	3069
Instance Segmentation	COCO 2017 (val)	--	1275
Image Classification	ImageNet-1K	Top-1 Acc85.5	1239
Semantic segmentation	Cityscapes	mIoU84.2	668
Semantic segmentation	Cityscapes (val)	mIoU84.2	572
Semantic segmentation	ADE20K	mIoU53.2	559
Semantic segmentation	Cityscapes (val)	mIoU84.2	527
Panoptic Segmentation	COCO 2017 (val)	PQ56.8	185
Semantic segmentation	COCO Stuff (val)	--	167
Semantic segmentation	COCO	mIoU48.2	110

Showing 10 of 29 rows

Other info

Code

Follow for update

@wizwand_team Discord