Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
About
Self-training has greatly facilitated domain adaptive semantic segmentation, which iteratively generates pseudo labels on unlabeled target data and retrains the network. However, realistic segmentation datasets are highly imbalanced, pseudo labels are typically biased to the majority classes and basically noisy, leading to an error-prone and suboptimal model. In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. Our algorithm, Region Impurity and Prediction Uncertainty (RIPU), introduces a new acquisition strategy characterizing the spatial adjacency of image regions along with the prediction confidence. We show that the proposed region-based selection strategy makes more efficient use of a limited budget than image-based or point-based counterparts. Further, we enforce local prediction consistency between a pixel and its nearest neighbors on a source image. Alongside, we develop a negative learning loss to make the features more discriminative. Extensive experiments demonstrate that our method only requires very few annotations to almost reach the supervised performance and substantially outperforms state-of-the-art methods. The code is available at https://github.com/BIT-DA/RIPU.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | GTA5 → Cityscapes (val) | mIoU69.1 | 533 | |
| Semantic segmentation | SYNTHIA to Cityscapes (val) | Rider IoU54.1 | 435 | |
| Semantic segmentation | GTA5 to Cityscapes (test) | mIoU67.1 | 151 | |
| Semantic segmentation | SYNTHIA-to-Cityscapes 16 categories (val) | mIoU (Overall)77.1 | 74 | |
| Semantic segmentation | Cityscapes GTA5 source 1.0 (val) | mIoU71.2 | 49 | |
| Semantic segmentation | GTA to Cityscapes (val) | Road Accuracy97 | 44 | |
| Semantic segmentation | ACDC (val) | mIoU63.5 | 29 |