Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

About

Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Extensive experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based, Transformer-based, and diffusion-based methods while maintaining high computational efficiency. The code is available at https://github.com/EnVision-Research/MTMamba.

Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes (val)
mIoU76.41
527
Semantic segmentationCityscapes
mIoU91.11
494
Surface Normal EstimationNYU v2 (test)--
224
Depth EstimationNYU Depth V2
RMSE0.4818
209
Depth EstimationNYU V2
RMSE0.4818
167
Semantic segmentationNYUD v2
mIoU57.01
150
Depth EstimationNYU v2 (val)
RMSE0.4818
65
Saliency DetectionPascal Context (test)
maxF85.56
57
Surface Normal EstimationPascal Context (test)
mErr14.29
50
Saliency DetectionPascal Context
maxF Score85.56
45
Showing 10 of 27 rows

Other info

Follow for update