RF-Net: An End-to-End Image Matching Network based on Receptive Field
About
This paper proposes a new end-to-end trainable matching network based on receptive field, RF-Net, to compute sparse correspondence between images. Building end-to-end trainable matching framework is desirable and challenging. The very recent approach, LF-Net, successfully embeds the entire feature extraction pipeline into a jointly trainable pipeline, and produces the state-of-the-art matching results. This paper introduces two modifications to the structure of LF-Net. First, we propose to construct receptive feature maps, which lead to more effective keypoint detection. Second, we introduce a general loss function term, neighbor mask, to facilitate training patch selection. This results in improved stability in descriptor training. We trained RF-Net on the open dataset HPatches, and compared it with other methods on multiple benchmark datasets. Experiments show that RF-Net outperforms existing state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Captioning | MS COCO Karpathy (test) | CIDEr1.219 | 682 | |
| Local Feature Matching | HPatches (all) | MMA@5px59.08 | 15 | |
| Local Feature Matching | HPatches (viewpoint) | MMA (5px)56.62 | 15 | |
| Local Feature Matching | HPatches illumination | MMA@5px61.63 | 15 | |
| Camera pose estimation | MVS | mAA @ 20 deg10 | 15 | |
| Local Descriptor Matching | Roto-360 1.0 (test) | MMA @10px15.64 | 14 |