Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
About
Semi-supervised semantic segmentation (SSS) has recently gained increasing research interest as it can reduce the requirement for large-scale fully-annotated training data. The current methods often suffer from the confirmation bias from the pseudo-labelling process, which can be alleviated by the co-training framework. The current co-training-based SSS methods rely on hand-crafted perturbations to prevent the different sub-nets from collapsing into each other, but these artificial perturbations cannot lead to the optimal solution. In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views. In particular, we first propose a new cross-view consistency (CVC) strategy that encourages the two sub-nets to learn distinct features from the same input by introducing a feature discrepancy loss, while these distinct features are expected to generate consistent prediction scores of the input. The CVC strategy helps to prevent the two sub-nets from stepping into the collapse. In addition, we further propose a conflict-based pseudo-labelling (CPL) method to guarantee the model will learn more useful information from conflicting predictions, which will lead to a stable training process. We validate our new CCVC approach on the SSS benchmark datasets where our method achieves new state-of-the-art performance. Our code is available at https://github.com/xiaoyao3302/CCVC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | PASCAL VOC 2012 (val) | mIoU79.6 | 166 | |
| Semantic segmentation | PASCAL VOC classic 2012 (val) | -- | 143 | |
| Semantic segmentation | Pascal VOC blended 2012 (train) | mIoU79 | 96 | |
| Semantic segmentation | Cityscapes 1/4 (744 labels) | mIoU77.3 | 91 | |
| Semantic segmentation | PASCAL VOC Augmented 2012 | mIoU79 | 85 | |
| Semantic segmentation | Cityscapes 1/16 (186 labeled samples) | mIoU74.9 | 78 | |
| Semantic segmentation | CITYSCAPES 1/8 labeled samples 372 labels (val) | mIoU76.4 | 65 | |
| Referring Expression Segmentation | RefCOCOg UMD (val) | mIoU42.5 | 52 | |
| Semantic segmentation | Pascal VOC Original protocol 92 labeled images | mIoU70.2 | 48 | |
| Referring Expression Segmentation | RefCOCOg UMD (test-u) | mIoU43.49 | 46 |