Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deploy DINO with Many-to-Many Association

About

Motivated by the limited generalization of supervised image matching models to unseen image domains, we explore the zero-shot deployment of DINO features for this task. The generalist visual representation extracted from DINO has inherent ambiguity when used to match feature points among semantically similar instances, prompting us to adopt a many-to-many (m-to-m) matching paradigm. However, the existing robust mechanism under m-to-m data association is computationally heavy, which requires finding a maximum-cardinality matching in the inlier association graph for each parameter evaluation. To address this inefficiency, we introduce a novel likelihood perspective, which interprets the existing method as a zeroth-order approximation of otherwise intractable likelihood calculation,and inspires us to propose a faster and finer-grained robust mechanism, termed as Harmonic Consensus Maximization (HCM). Take camera pose estimation as an exemplifying downstream task, we demonstrate that general-purpose visual features, used out of the box without any adaptation, can compete with specialized matching models on out-of-distribution datasets when mated with m-to-m association and the HCM mechanism.

Haodong Jiang, Mingzhe Li, Junfeng Wu• 2026

Related benchmarks

TaskDatasetResultRank
Camera relative pose estimationNAVI-Wild
Pose AUC @ 10°4.3
17
Camera relative pose estimationNAVI Multi
AUC@10° (Pose)4.1
17
Camera relative pose estimationScanNet
Pose AUC @ 10°5.4
17
Showing 3 of 3 rows

Other info

Follow for update