Cross-View Completion Models are Zero-shot Correspondence Estimators
About
In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.
Honggyu An, Jinhyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Depth Estimation | KITTI (Eigen split) | RMSE4.128 | 276 | |
| Depth Estimation | Cityscapes (test) | -- | 40 | |
| Geometric Matching | HPatches 240 x 240 | AEE (I)0.49 | 33 | |
| Geometric Matching | HPatches Original Resolution 3 | AEPE Threshold I1.51 | 31 | |
| Geometric Matching | ETH3D Original Resolution | AEPE (Rate 3)1.8 | 19 | |
| Depth Estimation | Cityscapes dynamic objects | AbsRel0.127 | 4 |
Showing 6 of 6 rows