BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization
About
Geometric differences between cross-view images, such as drone and satellite views, significantly increase the challenge of Cross-View Geo-Localization (CVGL), which aims to acquire the geolocation of images by image retrieval. To further enhance the CVGL performance, this paper proposes a parameter-efficient adaptation framework for bridging the geometric gap across images based on the vision foundation model (VFM) (e.g., DINOv3), termed BGG. BGG not only effectively leverages the general visual representations of VFM and captures the robust and consistent features from cross-view images, but also utilizes the generalization capabilities of the VFM, significantly improving the CVGL performance. It mainly contains a Multi-granularity Feature Enhancement Adapter (MFEA) and a Frequency-Aware Structural Aggregation (FASA) module. Specifically, MFEA enhances the scale adaptability and viewpoint robustness of features by multi-level dilated convolutions, effectively bridging the cross-view geometric gap with small training costs. Additionally, considering the [CLS] token lacks spatial details for precise image retrieval and localization, the FASA module modulates patch tokens in the frequency domain and performs adaptive aggregation for local structural feature enhancement. Finally, BGG fuses the enhanced local features with the [CLS] token for more accurate CVGL. Extensive experiments on University-1652 and SUES-200 datasets demonstrate that BGG has significant advantages over other methods and achieves state-of-the-art localization performance with low training costs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cross-view geo-localization | University-1652 Drone -> Satellite | R@196.24 | 149 | |
| Cross-view geo-localization | University-1652 Satellite -> Drone | R@197.57 | 112 | |
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 150m | R@199.3 | 74 | |
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 250m | R@199.53 | 49 | |
| Cross-view Geo-localization (Drone to Satellite) | SUES-200 300m altitude | R@199.25 | 48 | |
| Cross-view Geo-localization (Satellite to Drone) | SUES-200 300m altitude | R@198.75 | 47 | |
| Cross-view geo-localization | SUES-200 Satellite→Drone (200m) | R@198.75 | 41 | |
| Cross-view Geo-localization (Satellite to Drone) | SUES-200 250m altitude | R@198.75 | 38 | |
| Drone-to-Satellite Cross-view Geo-localization | SUES-200 200m | Recall@199.45 | 37 | |
| Cross-view geo-localization | SUES-200 Satellite→Drone 150m | Recall@198.75 | 30 |