SupScene: Scene-Structured Overlap Supervision for Image Retrieval in Unconstrained SfM
About
Image retrieval is a critical step for reducing the quadratic cost of image matching in unconstrained Structure-from-Motion (SfM). Unlike generic image retrieval, however, the relevant goal of SfM is to identify geometrically matchable image pairs rather than merely semantically similar images. Prevailing methods are largely trained under anchor-centric tuple guidance, which organizes the training around isolated tuples and under-utilizes the dense, graded overlap structure naturally established within a SfM scene. In this work, we present SupScene, a scene-structured training framework that samples connected local subgraphs from SfM overlap graphs and jointly supervises all valid within-subgraph pairwise relations. To explicitly align the trained descriptor with geometric co-visibility, we further introduce an overlap-ordered objective that combines multi-similarity optimization with a continuous relative-overlap ranking term. In addition, the proposed framework is instantiated with a lightweight Structural Context Probe Pooling (SCPP) head that aggregates complementary structural responses into a compact global descriptor. Extensive experimental results on multiple benchmarks demonstrate that our method can significantly improve overall retrieval performance and enhance the completeness of downstream SfM reconstructions. Code and models are available at https://github.com/Suxilan/SupScene.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Retrieval | GL3D official (test) | Recall@2573 | 6 | |
| Structure-from-Motion | 1DSfM Gendarmenmarkt | Registered Images1.05e+3 | 4 | |
| Structure-from-Motion | 1DSfM Madrid Metropolis | Registered Images491 | 4 | |
| Structure-from-Motion | 1DSfM Alamo | Registered Images972 | 4 |