MARCO: Navigating the Unseen Space of Semantic Correspondence
About
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 PCK@0.01), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at https://github.com/visinf/MARCO .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic Correspondence | SPair-71k (test) | PCK@0.187.2 | 146 | |
| Semantic Correspondence | PF-PASCAL | PCK @ alpha=0.196.9 | 107 | |
| Semantic Correspondence | SPair-71k | PCK @ 0.0127 | 22 | |
| Semantic Correspondence | AP-10K Intra-species (test) | PCK@0.1089.1 | 22 | |
| Semantic Correspondence | AP-10K | PCK@0.1 (I.S.)89.1 | 15 | |
| Semantic Correspondence | AP-10K cross-family | PCK@0.1083.4 | 14 | |
| Semantic Correspondence | SpairU | PCK@0.1067.5 | 11 | |
| Semantic Correspondence | AP-10K C.S. | PCK@0.1088.3 | 10 | |
| Semantic Correspondence | SPair-71k Geo-Aware | PCK@0.0122.8 | 9 | |
| Semantic Correspondence | SPair-U (Unseen keypoints) | Aeroplane Score86.6 | 8 |