Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Segment Anything Model is a Good Teacher for Local Feature Learning

About

Local feature detection and description play an important role in many computer vision tasks, which are designed to detect and describe keypoints in "any scene" and "any downstream task". Data-driven local feature learning methods need to rely on pixel-level correspondence for training, which is challenging to acquire at scale, thus hindering further improvements in performance. In this paper, we propose SAMFeat to introduce SAM (segment anything model), a fundamental model trained on 11 million images, as a teacher to guide local feature learning and thus inspire higher performance on limited datasets. To do so, first, we construct an auxiliary task of Attention-weighted Semantic Relation Distillation (ASRD), which distillates feature relations with category-agnostic semantic information learned by the SAM encoder into a local feature learning network, to improve local feature description using semantic discrimination. Second, we develop a technique called Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), which utilizes semantic groupings derived from SAM as weakly supervised signals, to optimize the metric space of local descriptors. Third, we design an Edge Attention Guidance (EAG) to further improve the accuracy of local feature detection and description by prompting the network to pay more attention to the edge region guided by SAM. SAMFeat's performance on various tasks such as image matching on HPatches, and long-term visual localization on Aachen Day-Night showcases its superiority over previous local features. The release code is available at https://github.com/vignywang/SAMFeat.

Jingqian Wu, Rongtao Xu, Zach Wood-Doughty, Changwei Wang, Shibiao Xu, Edmund Y. Lam• 2023

Related benchmarks

TaskDatasetResultRank
Visual LocalizationAachen Day-Night v1.1 (Day)
SR (0.25m, 2°)90.2
70
Pose EstimationMegaDepth 1500 (test)
AUC @ 5°52.3
38
3D ReconstructionETH local feature benchmark Gendarmenmarkt
Track Length7.61
24
3D ReconstructionETH local feature benchmark Tower of London
Track Length7.76
24
3D ReconstructionMadrid Metropolis
Track Length8.67
19
3D ReconstructionETH Herzjesu Small-Scale
Track Length4.97
16
Visual LocalizationAachen Day-Night v1.1
Success Rate (2°, 0.25m)75.9
12
Pose EstimationScanNet (test)
AUC@5°15.2
11
Image MatchingHPatches 1 (test)
MMA (3px)81.96
10
3D ReconstructionETH Fountain Small-Scale
Track Length4.22
8
Showing 10 of 10 rows

Other info

Follow for update