GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space

About

Timestamp prediction aims to determine when an image was captured using only visual information, supporting applications such as metadata correction, retrieval, and digital forensics. In outdoor scenarios, hourly estimates rely on cues like brightness, hue, and shadow positioning, while seasonal changes and weather inform date estimation. However, these visual cues significantly depend on geographic context, closely linking timestamp prediction to geo-localization. To address this interdependence, we introduce GT-Loc, a novel retrieval-based method that jointly predicts the capture time (hour and month) and geo-location (GPS coordinates) of an image. Our approach employs separate encoders for images, time, and location, aligning their embeddings within a shared high-dimensional feature space. Recognizing the cyclical nature of time, instead of conventional contrastive learning with hard positives and negatives, we propose a temporal metric-learning objective providing soft targets by modeling pairwise time differences over a cyclical toroidal surface. We present new benchmarks demonstrating that our joint optimization surpasses previous time prediction methods, even those using the ground-truth geo-location as an input during inference. Additionally, our approach achieves competitive results on standard geo-localization tasks, and the unified embedding space facilitates compositional and text-based image retrieval.

David G. Shatwell, Ishan Rajendrakumar Dave, Sirnam Swetha, Mubarak Shah• 2025

Related benchmarks

Task	Dataset	Result
Classification	Country	Accuracy92.5	46
Time Prediction	CVT (test)	ToY Error65.1	16
Time Prediction	TIGeR 86k (test)	ToY Error74.58	16
Regression	SatBird	Top-K Score63.3	10
Classification	Wildfire	Average Precision (AP)79.5	10
Regression	Air Temperature	R²0.942	10
Regression	Median Income	R^20.458	10
Geo-localization	CVT (test)	Recall@200km42.63	8
Geo-localization	TIGeR 86k (test)	Recall@200km21.07	8
Geo-time Aware Image Retrieval	CVT	R@116.45	5

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord