PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering
About
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO and Cityscapes with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training. The code is available at https://github.com/janghyuncho/PiCIE.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes (test) | mIoU12.3 | 1145 | |
| Semantic segmentation | S3DIS (Area 5) | mIOU17.9 | 799 | |
| Semantic segmentation | Cityscapes | mIoU12.3 | 578 | |
| Semantic segmentation | Cityscapes (val) | mIoU12.3 | 572 | |
| Semantic segmentation | COCO Stuff | mIoU1.44e+3 | 195 | |
| Semantic segmentation | Coco-Stuff (test) | mIoU14.8 | 184 | |
| Semantic segmentation | COCO Stuff (val) | mIoU14.4 | 126 | |
| 3D Semantic Segmentation | ScanNet (val) | mIoU7.6 | 100 | |
| Semantic segmentation | COCO Stuff-27 (val) | mIoU1.44e+3 | 75 | |
| Semantic segmentation | Cityscapes-C (val) | mIoU10.3 | 56 |