Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

About

Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable prediction. In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation utilizing the advanced Mamba. Unlike conventional methods that rely on CNNs, with their limited local receptive fields, or Vision Transformers (ViTs), which offer global receptive fields at the cost of quadratic complexity, our model achieves global receptive fields with linear complexity. By employing a Siamese encoder and innovating a Mamba-based fusion mechanism, we effectively select essential information from different modalities. A decoder is then developed to enhance the channel-wise modeling ability of the model. Our proposed method is rigorously evaluated on both RGB-Thermal and RGB-Depth semantic segmentation tasks, demonstrating its superiority and marking the first successful application of State Space Models (SSMs) in multi-modal perception tasks. Code is available at https://github.com/zifuwan/Sigma.

Zifu Wan, Pingping Zhang, Yuhao Wang, Silong Yong, Simon Stepputtis, Katia Sycara, Yaqi Xie• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationMFNet (test)
mIoU60.2
168
Multimodal Sentiment AnalysisMOSEI--
168
Multimodal Sentiment AnalysisMOSI
Accuracy86.3
72
Semantic segmentationSUN RGB-D
mIoU52.4
65
Semantic segmentationPST900
mIoU88.6
57
Semantic segmentationMFNet nighttime (test)
mIoU60.9
42
Semantic segmentationSUN-RGBD
IoU52.4
37
Semantic segmentationMFNet daytime (test)
mIoU55
30
Semantic segmentationNYU Depth V2
mIoU57
28
Semantic segmentationNYU Depth V2
mIoU57
27
Showing 10 of 19 rows

Other info

Code

Follow for update