Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

About

Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at https://github.com/EnVision-Research/MTMamba.

Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCityscapes
mIoU90.77
494
Surface Normal EstimationNYU v2 (test)--
224
Depth EstimationNYU Depth V2
RMSE0.5066
209
Depth EstimationNYU V2
RMSE0.5066
167
Semantic segmentationNYUD v2
mIoU55.82
150
Depth EstimationNYU v2 (val)
RMSE0.5066
65
Saliency DetectionPascal Context (test)
maxF84.14
57
Surface Normal EstimationPascal Context (test)
mErr14.14
50
Surface Normal EstimationPascal Context
Mean Error (MAE)14.14
45
Saliency DetectionPascal Context
maxF Score84.14
45
Showing 10 of 23 rows

Other info

Follow for update