Expectation-Maximization Attention Networks for Semantic Segmentation

About

Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context and COCO Stuff, on which we set new records.

Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu• 2019

Related benchmarks

Task	Dataset	Result
Semantic segmentation	PASCAL VOC 2012 (test)	mIoU87.7	1477
Semantic segmentation	Cityscapes	--	668
Semantic segmentation	COCO Stuff	mIoU39.9	399
Semantic segmentation	PASCAL Context (val)	mIoU53.1	360
Semantic segmentation	Cityscapes (val)	mIoU81	301
Semantic segmentation	Pascal VOC (test)	mIoU88.2	268
Semantic segmentation	Pascal Context (test)	mIoU53.1	223
Semantic segmentation	Pascal Context	mIoU53.1	217
Semantic segmentation	Coco-Stuff (test)	mIoU39.9	216
Semantic segmentation	Mapillary (val)	mIoU47.5	153

Showing 10 of 22 rows

Other info

Code

Follow for update

@wizwand_team Discord