DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping
About
We describe an unsupervised method to detect and segment portions of images of live scenes that, at some point in time, are seen moving as a coherent whole, which we refer to as objects. Our method first partitions the motion field by minimizing the mutual information between segments. Then, it uses the segments to learn object models that can be used for detection in a static image. Static and dynamic models are represented by deep neural networks trained jointly in a bootstrapping strategy, which enables extrapolation to previously unseen objects. While the training process requires motion, the resulting object segmentation network can be used on either static images or videos at inference time. As the volume of seen videos grows, more and more objects are seen moving, priming their detection, which then serves as a regularizer for new objects, turning our method into unsupervised continual learning to segment objects. Our models are compared to the state of the art in both video object segmentation and salient object detection. In the six benchmark datasets tested, our models compare favorably even to those using pixel-level supervision, despite requiring no manual annotation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2016 (val) | -- | 564 | |
| Salient Object Detection | ECSSD | -- | 202 | |
| Unsupervised Video Object Segmentation | DAVIS 2016 (val) | -- | 108 | |
| Unsupervised Video Object Segmentation | SegTrack v2 | Jaccard Score74.2 | 56 | |
| Video Object Segmentation | DAVIS 2016 | -- | 44 | |
| Unsupervised Video Object Segmentation | FBMS59 | Jaccard Score73.2 | 43 | |
| Video Object Segmentation | SegTrack v2 (test) | J Mean74.2 | 40 | |
| Video Object Segmentation | SegTrack v2 | IoU (J)74.2 | 34 | |
| Video Object Segmentation | DAVIS 2016 (test) | -- | 29 | |
| Single Object Video Segmentation | SegTrack v2 (val) | J Mean74.2 | 27 |