SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition

About

Visual Place Recognition (VPR) requires robust retrieval of geotagged images despite large appearance, viewpoint, and environmental variation. Prior methods focus on descriptor fine-tuning or fixed sampling strategies yet neglect the dynamic interplay between spatial context and visual similarity during training. We present SAGE (Spatial-visual Adaptive Graph Exploration), a unified training pipeline that enhances granular spatial-visual discrimination by jointly improving local feature aggregation, organize samples during training, and hard sample mining. We introduce a lightweight Soft Probing module that learns residual weights from training data for patch descriptors before bilinear aggregation, boosting distinctive local cues. During training we reconstruct an online geo-visual graph that fuses geographic proximity and current visual similarity so that candidate neighborhoods reflect the evolving embedding landscape. To concentrate learning on the most informative place neighborhoods, we seed clusters from high-affinity anchors and iteratively expand them with a greedy weighted clique expansion sampler. Implemented with a frozen DINOv2 backbone and parameter-efficient fine-tuning, SAGE achieves SOTA across eight benchmarks. Notably, our method obtains 100% Recall@10 on SPED only using 4096D global descriptors. The code and model are available at https://github.com/chenshunpeng/SAGE.

Shunpeng Chen, Changwei Wang, Rongtao Xu, Xingtian Pei, Yukun Song, Jinzhou Lin, Wenhao Xu, Jingyi Zhang, Li Guo, Shibiao Xu• 2025

Related benchmarks

Task	Dataset	Result
Visual Place Recognition	MSLS (val)	Recall@194.5	305
Visual Place Recognition	Tokyo24/7	Recall@197.5	229
Visual Place Recognition	Nordland	Recall@196	169
Visual Place Recognition	Pitts250k	Recall@195.7	163
Visual Place Recognition	SPED	Recall@198.9	118
Visual Place Recognition	Pittsburgh30k (test)	Recall@195.8	106
Visual Place Recognition	Pitts 250k (test)	Recall@198.4	73
Visual Place Recognition	Eynsham	Recall@192.3	66
Visual Place Recognition	Tokyo24/7 (test)	Recall@197.5	29
Visual Place Recognition	AmsterTime (test)	Recall@183.5	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord