ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
About
Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost. Code available at: https://github.com/pavelsuma/ELViS/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Retrieval | INSTRE | mAP81.1 | 35 | |
| Image Retrieval | Out-of-domain (OOD) | mAP68.7 | 28 | |
| Image Retrieval | Overall (Average) | mAP60.6 | 21 | |
| Image Retrieval | GLD v2 | mAP@10056.9 | 21 | |
| Image Retrieval | ILIAS | mAP18.8 | 14 | |
| Image Retrieval | Met | mAP77.9 | 14 | |
| Image Retrieval | Prod1M | mAP44.1 | 14 | |
| Image Retrieval | RP2K | mAP59.2 | 14 | |
| Image Retrieval | ROP+1M | mAP68.8 | 14 | |
| Image Retrieval | SOP-1k | mAP54.9 | 14 |