CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations

About

Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects that are visually similar. To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images, which can be transferred to downstream supervised tasks such as image classification. Experiments show that CSP can improve model performance on both iNat2018 and fMoW datasets. Especially, on iNat2018, CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.

Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, Stefano Ermon• 2023

Related benchmarks

Task	Dataset	Result
Classification	Land Cover	F1 Score44.4	76
Regression	California Housing	--	71
Classification	Land Use Coarse	F1 Score55.2	70
Classification	Land Use Fine	F1 Score45.7	70
Regression	Urban Perception avg. 6 tasks	R2 Score4.8	58
Regression	Crime Incidence	R-squared (%)52.5	48
Urban Perception	Place Pulse 2.0	Cleanliness0.9	44
Regression	ZIP Code weighted avg. 29 tasks (cross-regional)	R^20.422	40
Regression	ZIP Code weighted avg. 29 tasks	R^2 (%)42.2	38
Regression	Energy Consumption	R^2 (%)11.7	35

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord