LF-Net: Learning Local Features from Images

About

We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-differentiable, we show that we can optimize the network in a two-branch setup by confining it to one branch, while preserving differentiability in the other. We train our method on both indoor and outdoor datasets, with depth data from 3D sensors for the former, and depth estimates from an off-the-shelf Structure-from-Motion solution for the latter. Our models outperform the state of the art on sparse feature matching on both datasets, while running at 60+ fps for QVGA images.

Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Moo Yi• 2018

Related benchmarks

Task	Dataset	Result
Homography Estimation	HPatches	Overall Accuracy (< 1px)34.4	81
Image Matching	Kinect 1	MS0.44	38
Image Matching	Kinect 2	Matching Score (MS)0.51	38
Image Matching	DeSurT (833 pairs total)	MS Score28	38
Image Matching	Simulation	MS21	38
Homography Estimation	HPatches (viewpoint)	Accuracy (<1px)16.8	27
Image Matching	HPatches (full)	MMA (Viewpoint)20	21
Keypoint Matching	HPatches All variations	Repeatability43.8	17
Local Feature Matching	HPatches illumination	MMA@5px62.21	15
Local Feature Matching	HPatches (all)	MMA@5px56.45	15

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord