Space-Time Correspondence as a Contrastive Random Walk
About
This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2017 (val) | J mean64.8 | 1130 | |
| Video Object Segmentation | YouTube-VOS 2018 (val) | J Score (Seen)68.7 | 493 | |
| Video Object Segmentation | DAVIS 2017 (test) | J (Jaccard Index)72.9 | 107 | |
| Medical Image Segmentation | CVC-ClinicDB | Dice Score82.92 | 68 | |
| Video Object Segmentation | DAVIS 2017 | Jaccard Index (J)72.9 | 42 | |
| Pose Propagation | JHMDB | PCK@0.159.3 | 20 | |
| Video label propagation | JHMDB (val) | PCK@0.158.8 | 17 | |
| Human Pose Tracking | JHMDB (val) | PCK@.159.3 | 15 | |
| Video Object Segmentation | VOST 1.0 (test) | J_tr13.9 | 13 | |
| Human Part Propagation | VIP (val) | mIoU38.6 | 12 |