Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tracking Emerges by Colorizing Videos

About

We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to colorize gray-scale videos by copying colors from a reference frame. Quantitative and qualitative experiments suggest that this task causes the model to automatically learn to track visual regions. Although the model is trained without any ground-truth labels, our method learns to track well enough to outperform the latest methods based on optical flow. Moreover, our results suggest that failures to track are correlated with failures to colorize, indicating that advancing video colorization may further improve self-supervised visual tracking.

Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy• 2018

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean34.6
1130
Video Object SegmentationYouTube-VOS 2018 (val)
J Score (Seen)43.1
493
Video Object SegmentationYouTube-VOS 2019 (val)
J-Score (Seen)43.3
231
One-shot Video Object SegmentationDAVIS 2016 (val)
J Mean38.9
28
Video label propagationJHMDB (val)
PCK@0.145.2
17
Human Pose TrackingJHMDB (val)
PCK@.145.2
15
Instance Segmentation PropagationDAVIS 2017
J Mean34.6
14
Human Pose TrackingJHMDB (split1)
PCK @ 0.145.2
11
One-shot Video Object SegmentationDAVIS 2017 (val)
J&F Mean34
11
Pose Keypoint PropagationJHMDB split 1 (val)
PCK@0.145.2
10
Showing 10 of 11 rows

Other info

Follow for update