COTR: Correspondence Transformer for Matching Across Images

About

We propose a novel framework for finding correspondences in images based on a deep neural network that, given two images and a query point in one of them, finds its correspondence in the other. By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings. Importantly, in order to capture both local and global priors, and to let our model relate between image regions using the most relevant among said priors, we realize our network using a transformer. At inference time, we apply our correspondence network by recursively zooming in around the estimates, yielding a multiscale pipeline able to provide highly-accurate correspondences. Our method significantly outperforms the state of the art on both sparse and dense correspondence problems on multiple datasets and tasks, ranging from wide-baseline stereo to optical flow, without any retraining for a specific dataset. We commit to releasing data, code, and all the tools necessary to train from scratch and ensure reproducibility.

Wei Jiang, Eduard Trulls, Jan Hosang, Andrea Tagliasacchi, Kwang Moo Yi• 2021

Related benchmarks

Task	Dataset	Result
Optical Flow Estimation	KITTI 2015 (test)	Fl-all13.65	108
Homography Estimation	HPatches	AUC @3px41.9	55
Point Tracking	DAVIS TAP-Vid	Average Jaccard (AJ)35.4	52
Retinal Image Alignment	FIRE	Acceptable Success Rate97.01	48
Point Tracking	TAP-Vid Kinetics	Overall Accuracy57.4	48
Point Tracking	DAVIS	AJ35.4	38
Retinal Image Alignment	KBSMC	Acceptable Rate32.62	35
Retinal Image Alignment	FLORI21	Acceptable Rate60	35
Point Tracking	TAP-Vid RGB-Stacking (test)	AJ6.8	32
Point Tracking	TAP-Vid DAVIS (test)	AJ35.4	31

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord