Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Optimal Transport Aggregation for Visual Place Recognition

About

The task of Visual Place Recognition (VPR) aims to match a query image against references from an extensive database of images from different places, relying solely on visual cues. State-of-the-art pipelines focus on the aggregation of features extracted from a deep backbone, in order to form a global descriptor for each image. In this context, we introduce SALAD (Sinkhorn Algorithm for Locally Aggregated Descriptors), which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem. In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative, enhancing the overall descriptor quality. Additionally, we leverage and fine-tune DINOv2 as a backbone, which provides enhanced description power for the local features, and dramatically reduces the required training time. As a result, our single-stage method not only surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost. Code and models are available at https://github.com/serizba/salad.

Sergio Izquierdo, Javier Civera• 2023

Related benchmarks

TaskDatasetResultRank
Visual Place RecognitionMSLS (val)
Recall@194.2
305
Visual Place RecognitionTokyo24/7
Recall@196.8
229
Visual Place RecognitionPitts30k
Recall@192.6
170
Visual Place RecognitionPitts250k
Recall@195.2
163
Visual Place RecognitionNordland
Recall@189.7
163
Visual Place RecognitionMSLS Challenge
Recall@182.7
156
Visual Place RecognitionSPED
Recall@192.1
118
Visual Place RecognitionPittsburgh30k (test)
Recall@192.5
106
Visual Place RecognitionAmsterTime
Recall@158.8
100
Visual Place RecognitionOxford RobotCar (Dusk)
Recall@194.8
78
Showing 10 of 65 rows

Other info

Code

Follow for update