VAGeo: View-specific Attention for Cross-View Object Geo-Localization
About
Cross-view object geo-localization (CVOGL) aims to locate an object of interest in a captured ground- or drone-view image within the satellite image. However, existing works treat ground-view and drone-view query images equivalently, overlooking their inherent viewpoint discrepancies and the spatial correlation between the query image and the satellite-view reference image. To this end, this paper proposes a novel View-specific Attention Geo-localization method (VAGeo) for accurate CVOGL. Specifically, VAGeo contains two key modules: view-specific positional encoding (VSPE) module and channel-spatial hybrid attention (CSHA) module. In object-level, according to the characteristics of different viewpoints of ground and drone query images, viewpoint-specific positional codings are designed to more accurately identify the click-point object of the query image in the VSPE module. In feature-level, a hybrid attention in the CSHA module is introduced by combining channel attention and spatial attention mechanisms simultaneously for learning discriminative features. Extensive experimental results demonstrate that the proposed VAGeo gains a significant performance improvement, i.e., improving acc@0.25/acc@0.5 on the CVOGL dataset from 45.43%/42.24% to 48.21%/45.22% for ground-view, and from 61.97%/57.66% to 66.19%/61.87% for drone-view.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cross-View Object Geo-Localization (Drone → Satellite) | CVOGL R (val) | Accuracy@5059.59 | 35 | |
| Cross-View Object Geo-Localization (Drone → Satellite) | CVOGL R (test) | Accuracy@5061.87 | 35 | |
| Cross-View Object Geo-Localization | CVOGL Ground → Satellite (HBox) standard (val) | Acc@5044.42 | 11 | |
| Cross-View Object Geo-Localization | CVOGL Ground → Satellite (HBox) standard (test) | Accuracy@5045.22 | 11 | |
| Cross-View Single-Object Geo-Localization | CVOGL-SVI (test) | Accuracy @ 0.2548.21 | 8 | |
| Cross-View Single-Object Geo-Localization | CVOGL-SVI (val) | Accuracy@0.2547.56 | 8 | |
| Cross-View Single-Object Geo-Localization | CVOGL-Drone (test) | Accuracy @ 0.2566.19 | 8 | |
| Cross-View Single-Object Geo-Localization | CVOGL-Drone (val) | Accuracy @ 0.2564.25 | 8 | |
| Cross-View Multi-Object Geo-localization | CMLocation V1 (test) | Accuracy @ 0.2557.79 | 6 | |
| Cross-View Multi-Object Geo-localization | CMLocation V1 (val) | Acc@0.2557.04 | 6 |