XoFTR: Cross-modal Feature Matching Transformer

About

We introduce, XoFTR, a cross-modal cross-view method for local feature matching between thermal infrared (TIR) and visible images. Unlike visible images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. To address this, XoFTR incorporates masked image modeling pre-training and fine-tuning with pseudo-thermal image augmentation to handle the modality differences. Additionally, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate our approach, we collect a comprehensive visible-thermal dataset, and show that our method outperforms existing methods on many benchmarks.

\"Onder Tuzcuo\u{g}lu, Aybora K\"oksal, Bu\u{g}ra Sofu, Sinan Kalkan, A. Ayd{\i}n Alatan• 2024

Related benchmarks

Task	Dataset	Result
Retinal Image Alignment	FIRE	Acceptable Success Rate98.51	48
Retinal Image Alignment	KBSMC	Acceptable Rate35.29	35
Retinal Image Alignment	FLORI21	Acceptable Rate93.33	35
Image Matching	Medical Retina (test)	AUC @ 3px Tolerance38.97	13
Homography Estimation	DIODE RGB-Normal (test)	AUC @ 3px19.79	13
Pose Estimation	METU-VisTIR RGB-IR (test)	AUC@5°18.47	13
Homography Estimation	DIODE RGB-Depth (test)	AUC @ 3px11.03	13
Pose Estimation	Any-syn RGB-IR synthetic (test)	AUC @ 5 deg27.03	13
Image Matching	Remote Sensing (test)	AUC@3px23.31	13
Pose Estimation	Any-syn RGB-Normal synthetic (test)	AUC @ 5°10.33	13

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord