Semantic Reinforced Attention Learning for Visual Place Recognition
About
Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task. In order to highlight the task-relevant visual cues in the feature embedding, the existing attention mechanisms are either based on artificial rules or trained in a thorough data-driven manner. To fill the gap between the two types, we propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning. The contribution lies in two-folds. (1) To suppress misleading local features, an interpretable local weighting scheme is proposed based on hierarchical feature distribution. (2) By exploiting the interpretability of the local weighting scheme, a semantic constrained initialization is proposed so that the local attention can be reinforced by semantic priors. Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Place Recognition | Tokyo24/7 | Recall@172.1 | 146 | |
| Visual Place Recognition | Pitts250k | Recall@187.8 | 84 |