ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

About

Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost. Code available at: https://github.com/pavelsuma/ELViS/

Pavel Suma, Giorgos Kordopatis-Zilos, Yannis Kalantidis, Giorgos Tolias• 2026

Related benchmarks

Task	Dataset	Result
Image Retrieval	INSTRE	mAP81.1	35
Image Retrieval	Out-of-domain (OOD)	mAP68.7	28
Image Retrieval	Overall (Average)	mAP60.6	21
Image Retrieval	GLD v2	mAP@10056.9	21
Image Retrieval	ILIAS	mAP18.8	14
Image Retrieval	Met	mAP77.9	14
Image Retrieval	Prod1M	mAP44.1	14
Image Retrieval	RP2K	mAP59.2	14
Image Retrieval	ROP+1M	mAP68.8	14
Image Retrieval	SOP-1k	mAP54.9	14

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord