MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

About

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set. Code is available at https://github.com/google-research/deeplab2.

Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen• 2020

Related benchmarks

Task	Dataset	Result
Panoptic Segmentation	Cityscapes (val)	PQ61.7	288
Panoptic Segmentation	COCO (val)	PQ51.1	223
Panoptic Segmentation	COCO 2017 (val)	PQ51.1	185
Panoptic Segmentation	COCO (test-dev)	PQ51.3	162
Panoptic Segmentation	COCO 2017 (test-dev)	PQ51.3	41
Panoptic Segmentation	COCO (test)	PQ49	23
Panoptic Segmentation	COCO panoptic 133 categories (val)	PQ51.1	12

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord