Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Correspondence from the Cycle-Consistency of Time

About

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation -- without finetuning -- across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

Xiaolong Wang, Allan Jabri, Alexei A. Efros• 2019

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)
J mean46.4
1130
Semantic segmentationVOC 2012 (val)
mIoU52.8
67
One-shot Video Object SegmentationDAVIS 2016 (val)
J Mean55.8
28
Pose PropagationJHMDB
PCK@0.157.7
20
Video label propagationJHMDB (val)
PCK@0.157.3
17
Human Pose TrackingJHMDB (val)
PCK@.157.3
15
Instance Segmentation PropagationDAVIS 2017
J Mean46.4
14
Human Part PropagationVIP (val)
mIoU28.9
12
Human Pose TrackingJHMDB (split1)
PCK @ 0.157.3
11
One-shot Video Object SegmentationDAVIS 2017 (val)
J&F Mean42.8
11
Showing 10 of 18 rows

Other info

Code

Follow for update