Robust Drone-View Geo-Localization via Content-Viewpoint Disentanglement
About
Drone-view geo-localization (DVGL) aims to match images of the same geographic location captured from drone and satellite perspectives. Despite recent advances, DVGL remains challenging due to significant appearance changes and spatial distortions caused by viewpoint variations. Existing methods typically assume that drone and satellite images can be directly aligned in a shared feature space via contrastive learning. Nonetheless, this assumption overlooks the inherent conflicts induced by viewpoint discrepancies, resulting in extracted features containing inconsistent information that hinders precise localization. In this study, we take a manifold learning perspective and model $\textit{the feature space of cross-view images as a composite manifold jointly governed by content and viewpoint}$. Building upon this insight, we propose $\textbf{CVD}$, a new DVGL framework that explicitly disentangles $\textit{content}$ and $\textit{viewpoint}$ factors. To promote effective disentanglement, we introduce two constraints: $\textit{(i)}$ an intra-view independence constraint that encourages statistical independence between the two factors by minimizing their mutual information; and $\textit{(ii)}$ an inter-view reconstruction constraint that reconstructs each view by cross-combining $\textit{content}$ and $\textit{viewpoint}$ from paired images, ensuring factor-specific semantics are preserved. As a plug-and-play module, CVD integrates seamlessly into existing DVGL pipelines and reduces inference latency. Extensive experiments on University-1652 and SUES-200 show that CVD exhibits strong robustness and generalization across various scenarios, viewpoints and altitudes, with further evaluations on CVUSA and CVACT confirming consistent improvements.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 150m | R@195.8 | 74 | |
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 250m | R@197.6 | 49 | |
| Cross-view Geo-localization (Drone to Satellite) | SUES-200 300m altitude | R@198.65 | 48 | |
| Cross-view Geo-localization (Satellite to Drone) | SUES-200 300m altitude | R@197.5 | 47 | |
| Cross-view geo-localization | SUES-200 Satellite→Drone (200m) | R@197.5 | 41 | |
| Cross-view Geo-localization (Satellite to Drone) | SUES-200 250m altitude | R@196.25 | 38 | |
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 200m | Recall@197.1 | 37 | |
| Cross-view geo-localization | SUES-200 Satellite→Drone 150m | Recall@196.25 | 30 |