Unsupervised Moving Object Detection via Contextual Information Separation
About
We propose an adversarial contextual model for detecting moving objects in images. A deep neural network is trained to predict the optical flow in a region using information from everywhere else but that region (context), while another network attempts to make such context as uninformative as possible. The result is a model where hypotheses naturally compete with no need for explicit regularization or hyper-parameter tuning. Although our method requires no supervision whatsoever, it outperforms several methods that are pre-trained on large annotated datasets. Our model can be thought of as a generalization of classical variational generative region-based segmentation, but in a way that avoids explicit regularization or solution of partial differential equations at run-time.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2016 (val) | J Mean71.5 | 564 | |
| Unsupervised Video Object Segmentation | DAVIS 2016 (val) | -- | 108 | |
| Unsupervised Video Object Segmentation | SegTrack v2 | Jaccard Score62 | 56 | |
| Video Object Segmentation | DAVIS 2016 | J-Measure71.5 | 44 | |
| Unsupervised Video Object Segmentation | FBMS59 | Jaccard Score63.6 | 43 | |
| Video Object Segmentation | SegTrack v2 (test) | J Mean62 | 40 | |
| Video Object Segmentation | SegTrack v2 | IoU (J)62 | 34 | |
| Video Object Segmentation | DAVIS 2016 (test) | -- | 29 | |
| Single Object Video Segmentation | SegTrack v2 (val) | J Mean62 | 27 | |
| Moving Object Segmentation | DAVIS Moving 2016 | Jaccard Index70.3 | 26 |