Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

About

Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. Based on strip pooling, we further investigate spatial pooling architecture design by 1) introducing a new strip pooling module that enables backbone networks to efficiently model long-range dependencies, 2) presenting a novel building block with diverse spatial pooling as a core, and 3) systematically comparing the performance of the proposed strip pooling and conventional spatial pooling techniques. Both novel pooling-based designs are lightweight and can serve as an efficient plug-and-play module in existing scene parsing networks. Extensive experiments on popular benchmarks (e.g., ADE20K and Cityscapes) demonstrate that our simple approach establishes new state-of-the-art results. Code is made available at https://github.com/Andrew-Qibin/SPNet.

Qibin Hou, Li Zhang, Ming-Ming Cheng, Jiashi Feng• 2020

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU45.6	3069
Semantic segmentation	Cityscapes (test)	mIoU82	1252
Semantic segmentation	Cityscapes (val)	mIoU81.9	527
Semantic segmentation	Pascal Context (test)	mIoU54.5	223
Semantic segmentation	PASCAL-Context 60 classes (test)	mIoU54.5	54
Face Parsing	iBugMask (test)	Left Brow73.2	6

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord