Rethinking Visual Geo-localization for Large-Scale Applications

About

Visual Geo-localization (VG) is the task of estimating the position where a given photo was taken by comparing it with a large database of images of known locations. To investigate how existing techniques would perform on a real-world city-wide VG application, we build San Francisco eXtra Large, a new dataset covering a whole city and providing a wide range of challenging cases, with a size 30x bigger than the previous largest dataset for visual geo-localization. We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. We achieve state-of-the-art performance on a wide range of datasets and find that CosPlace is robust to heavy domain changes. Moreover, we show that, compared to the previous state-of-the-art, CosPlace requires roughly 80% less GPU memory at train time, and it achieves better results with 8x smaller descriptors, paving the way for city-wide real-world visual geo-localization. Dataset, code and trained models are available for research purposes at https://github.com/gmberton/CosPlace.

Gabriele Berton, Carlo Masone, Barbara Caputo• 2022

Related benchmarks

Task	Dataset	Result
Visual Place Recognition	MSLS (val)	Recall@187.4	305
Visual Place Recognition	Tokyo24/7	Recall@189.5	229
Visual Place Recognition	Pitts30k	Recall@190.9	176
Visual Place Recognition	Nordland	Recall@158.5	169
Visual Place Recognition	Pitts250k	Recall@192.3	163
Visual Place Recognition	MSLS Challenge	Recall@167.5	156
Visual Place Recognition	SPED	Recall@180.1	118
Visual Place Recognition	Pittsburgh30k (test)	Recall@188.4	106
Visual Place Recognition	AmsterTime	Recall@147.7	100
Visual Place Recognition	Oxford RobotCar (Dusk)	Recall@168.5	78

Showing 10 of 97 rows

...

Other info

Code

Follow for update

@wizwand_team Discord