Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
About
Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU44 | 2731 | |
| Semantic segmentation | PASCAL VOC 2012 (val) | Mean IoU80.75 | 2040 | |
| Semantic segmentation | PASCAL VOC 2012 (test) | mIoU85.6 | 1342 | |
| Semantic segmentation | Cityscapes (test) | mIoU82.1 | 1145 | |
| Semantic segmentation | Cityscapes (val) | mIoU80.33 | 572 | |
| Semantic segmentation | Cityscapes (val) | mIoU80.3 | 332 | |
| Semantic segmentation | Pascal VOC (test) | mIoU85.6 | 236 | |
| Semantic segmentation | VOC | mIoU82 | 44 | |
| Semantic segmentation | Cityscapes fine+coarse (test) | mIoU82.1 | 12 | |
| Semantic Image Segmentation | Cityscapes | mIoU (%)82.1 | 8 |