D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
About
In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Homography Estimation | HPatches | Overall Accuracy (< 1px)16.7 | 81 | |
| Visual Localization | Aachen Day-Night v1.1 (Day) | SR (0.25m, 2°)84.1 | 70 | |
| Homography Estimation | HPatches | AUC @3px23.2 | 55 | |
| Image Matching | Kinect 1 | MS0.2 | 38 | |
| Image Matching | Simulation | MS11 | 38 | |
| Image Matching | Kinect 2 | Matching Score (MS)0.23 | 38 | |
| Image Matching | DeSurT (833 pairs total) | MS Score14 | 38 | |
| Pose Estimation | MegaDepth 1500 (test) | AUC @ 5°35.4 | 38 | |
| Visual Localization | RobotCar Seasons (night) | Recall (0.25m, 2°)20.4 | 35 | |
| Visual Localization | Extended CMU Seasons Urban | Recall @ (0.25m, 2°)94 | 34 |