Decoupling Makes Weakly Supervised Local Feature Better
About
Weakly supervised learning can help local feature methods to overcome the obstacle of acquiring a large-scale dataset with densely labeled correspondences. However, since weak supervision cannot distinguish the losses caused by the detection and description steps, directly conducting weakly supervised learning within a joint describe-then-detect pipeline suffers limited performance. In this paper, we propose a decoupled describe-then-detect pipeline tailored for weakly supervised local feature learning. Within our pipeline, the detection step is decoupled from the description step and postponed until discriminative and robust descriptors are learned. In addition, we introduce a line-to-window search strategy to explicitly use the camera pose information for better descriptor learning. Extensive experiments show that our method, namely PoSFeat (Camera Pose Supervised Feature), outperforms previous fully and weakly supervised methods and achieves state-of-the-art performance on a wide range of downstream tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Localization | Aachen Day-Night v1.1 (test) | Success Rate (0.25m, 2°)73.8 | 24 | |
| Image Matching | HPatches (full) | MMA (Viewpoint)0.728 | 21 | |
| Visual Localization | Aachen Day-Night 1.0 (Night) | AUC @ (0.25m, 2°)81.6 | 18 | |
| Sparse 3D Reconstruction | ETH Local Feature Benchmark Madrid Metropolis v1.0 | nReg419 | 17 | |
| 3D Reconstruction | ETH local feature benchmark Tower of London | Image Count778 | 16 | |
| 3D Reconstruction | ETH local feature benchmark Gendarmenmarkt | Image Count956 | 16 | |
| Image Matching | HPatches Overall v2 | MMAscore Overall0.775 | 15 | |
| Local Descriptor Matching | Roto-360 1.0 (test) | MMA @10px13.76 | 14 | |
| Local Feature Matching | HPatches Overall v1.0 | MMAscore77.5 | 12 | |
| Local Feature Matching | HPatches Viewpoint v1.0 | MMAscore72.8 | 12 |