Deformable Sprites for Unsupervised Video Decomposition
About
We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a \emph{Deformable Sprite} consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos. Code and video results can be found at https://deformable-sprites.github.io
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Unsupervised Video Object Segmentation | DAVIS 2016 (val) | -- | 108 | |
| Unsupervised Video Object Segmentation | SegTrack v2 | Jaccard Score72.1 | 56 | |
| Unsupervised Video Object Segmentation | DAVIS 2016 (test) | J Mean79.1 | 50 | |
| Video Object Segmentation | DAVIS 2016 | -- | 44 | |
| Unsupervised Video Object Segmentation | FBMS59 | Jaccard Score71.8 | 43 | |
| Video Object Segmentation | SegTrack v2 (test) | J Mean72.1 | 40 | |
| Video Object Segmentation | SegTrack v2 | IoU (J)72.1 | 34 | |
| Video Object Segmentation | DAVIS 2016 (test) | -- | 29 | |
| Single Object Video Segmentation | SegTrack v2 (val) | J Mean72.1 | 27 | |
| Unsupervised Video Object Segmentation | DAVIS 2016 | Jaccard Score79.1 | 24 |