MacFormer: Semantic Segmentation with Fine Object Boundaries

About

Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in localized areas like object boundaries. To tackle this challenge, we introduce a new semantic segmentation architecture, ``MacFormer'', which features two key components. Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers. This enables better preservation of low-level features, such as elementary edges, during decoding. Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain, benefiting object boundaries with minimal computational complexity increase. MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on benchmark datasets ADE20K and Cityscapes under different computational constraints.

Guoan Xu, Wenfeng Huang, Tao Wu, Ligeng Chen, Wenjing Jia, Guangwei Gao, Xiatian Zhu, Stuart Perry• 2024

Related benchmarks

Task	Dataset	Result	Rank
Semantic segmentation	ADE20K	mIoU52.8		699

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord