Cross-View Completion Models are Zero-shot Correspondence Estimators

About

In this work, we explore new perspectives on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we demonstrate that the cross-attention map within cross-view completion models captures correspondence more effectively than other correlations derived from encoder or decoder features. We verify the effectiveness of the cross-attention map by evaluating on both zero-shot matching and learning-based geometric matching and multi-frame depth estimation. Project page is available at https://cvlab-kaist.github.io/ZeroCo/.

Honggyu An, Jinhyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim• 2024

Related benchmarks

Task	Dataset	Result
Depth Estimation	KITTI (Eigen split)	RMSE4.128	291
Depth Estimation	Cityscapes (test)	--	40
Geometric Matching	HPatches 240 x 240	AEE (I)0.49	33
Geometric Matching	HPatches Original Resolution 3	AEPE Threshold I1.51	31
Geometric Matching	ETH3D Original Resolution	AEPE (Rate 3)1.8	19
Depth Estimation	Cityscapes dynamic objects	AbsRel0.127	4

Showing 6 of 6 rows

Other info

Code

Follow for update

@wizwand_team Discord