DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
About
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense constrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model achieves 37.9% AP on COCO instance segmentation, surpassing prior weakly supervised methods and is competitive to supervised methods. We also obtain state of the art weakly supervised results on PASCAL VOC12 and PF-PASCAL with real-time inference.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instance Segmentation | COCO 2017 (val) | APm0.338 | 1144 | |
| Instance Segmentation | COCO (val) | APmk32 | 472 | |
| Instance Segmentation | COCO (test-dev) | APM41.1 | 380 | |
| Instance Segmentation | PASCAL VOC 2012 (val) | mAP @0.563.6 | 173 | |
| Instance Segmentation | PASCAL VOC (val) | AP@0.5063.6 | 24 | |
| Instance Segmentation | COCO 49 (val) | AP31.4 | 20 | |
| Instance Segmentation | VOC 2012 (test) | AP @ IoU=0.5062.2 | 13 | |
| Instance Segmentation | iSAID 1.0 (val) | AP22.6 | 13 | |
| Semantic Correspondence | PF-PASCAL (val) | PCK @ 0.0559.3 | 8 | |
| Semantic Correspondence | PASCAL 3D+ (test) | AP0.317 | 4 |