Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation

About

Cross-view geo-localisation (CVGL) aims to estimate the geographic location of a query image by matching it with images from a large-scale database. However, the significant view-point discrepancies present considerable challenges for effective feature aggregation and alignment. To address these challenges, we propose a novel CVGL system that incorporates three key improvements. Firstly, we leverage the DINOv2 backbone with a convolution adapter fine-tuning to enhance model adaptability to cross-view variations. Secondly, we propose a multi-scale channel reallocation module to strengthen the diversity and stability of spatial representations. Finally, we propose an improved aggregation module that integrates a Mixture-of-Experts (MoE) routing into the feature aggregation process. Specifically, the module dynamically selects expert subspaces for the keys and values in a cross-attention framework, enabling adaptive processing of heterogeneous input domains. Extensive experiments on the University-1652 and SUES-200 datasets demonstrate that our method achieves competitive performance with fewer trained parameters.

Hualin Ye, Bingxi Liu, Jixiang Du, Yu Qin, Ziyi Chen, Hong Zhang• 2025

Related benchmarks

Task	Dataset	Result
Cross-view geo-localization	University-1652 Drone -> Satellite	R@194.41	149
Cross-view geo-localization	University-1652 Satellite -> Drone	R@196.72	112
Drone-to-Satellite Retrieval	SUES-200 150m	R@198.75	98
Drone-to-Satellite Retrieval	SUES-200 250m	R@198.75	76
Drone-to-Satellite Retrieval	SUES-200 200m	R@1 Accuracy98.75	66
Drone-to-Satellite Retrieval	SUES-200 300m	R@198.85	66

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord