Deformable Sprites for Unsupervised Video Decomposition

About

We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a \emph{Deformable Sprite} consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos. Code and video results can be found at https://deformable-sprites.github.io

Vickie Ye, Zhengqi Li, Richard Tucker, Angjoo Kanazawa, Noah Snavely• 2022

Related benchmarks

Task	Dataset	Result
Unsupervised Video Object Segmentation	DAVIS 2016 (val)	--	108
Unsupervised Video Object Segmentation	SegTrack v2	Jaccard Score72.1	56
Video Object Segmentation	DAVIS 2016	--	53
Unsupervised Video Object Segmentation	DAVIS 2016 (test)	J Mean79.1	50
Unsupervised Video Object Segmentation	FBMS59	Jaccard Score71.8	43
Video Object Segmentation	SegTrack v2	--	42
Video Object Segmentation	SegTrack v2 (test)	J Mean72.1	40
Video Reconstruction	DAVIS	--	33
Unsupervised Video Object Segmentation	DAVIS 2016	Jaccard Score79.1	32
Video Object Segmentation	DAVIS 2016 (test)	--	29

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord