Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

About

Image geolocalization is the challenging task of predicting the geographic coordinates of origin for a given photo. It is an unsolved problem relying on the ability to combine visual clues with general knowledge about the world to make accurate predictions across geographies. We present $\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$, a robust, publicly available foundation model not only achieving state-of-the-art performance on multiple open-domain image geolocalization benchmarks but also doing so in a zero-shot setting, outperforming supervised models trained on more than 4 million images. Our method introduces a meta-learning approach for generalized zero-shot learning by pretraining CLIP from synthetic captions, grounding CLIP in a domain of choice. We show that our method effectively transfers CLIP's generalized zero-shot capabilities to the domain of image geolocalization, improving in-domain generalized zero-shot performance without finetuning StreetCLIP on a fixed set of classes.

Lukas Haas, Silas Alberti, Michal Skreta• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	EuroSAT (test)	--	195
Image Geolocalization	IM2GPS3K (test)	Success Rate (25km)22.4	167
Image Classification	Resisc45 (test)	Top-1 Accuracy87.84	90
Image Geolocalization	Im2GPS3k	Success Rate @ 200 km37.4	72
Classification	AID (test)	Top-1 Accuracy92.77	69
Classification	WHU-RS19 (test)	Top-1 Acc97.02	36
Image Geolocalization	IM2GPS	Success Rate @ 25 km (City)28.3	34
Image Classification	PatternNet (test)	Top-1 Accuracy95.6	28
Image Classification	MLRSNet (test)	Top-1 Accuracy79.64	28
Image Classification	RSC11 (test)	Top-1 Accuracy89.19	28

Showing 10 of 17 rows

Other info

Code

Follow for update

@wizwand_team Discord