Normalized Matching Transformer

About

We introduce the Normalized Matching Transformer (NMT), a deep learning approach for efficient and accurate sparse semantic keypoint matching between image pairs. NMT consists of a strong visual backbone, geometric feature refinement via SplineCNN, followed by a normalized Transformer for computing matching features. Central to NMT is our hyperspherical normalization strategy: we enforce unit-norm embeddings at every Transformer layer and train with a combined contrastive InfoNCE and hyperspherical uniformity loss to yield more discriminative keypoint representations. This novel architecture/loss combination encourages close alignment of matching image features and large distances between non-matching ones not only at the output level, but for each layer. Despite its architectural simplicity, NMT sets a new state-of-the-art performance on PascalVOC and SPair-71k, outperforming BBGM, ASAR, COMMON and GMTR by 5.1% and 2.2%, respectively, while converging in at least 1.7x fewer epochs compared to other state-of-the-art baselines. These results underscore the power of combining pervasive normalization with hyperspherical learning for matching tasks.

Abtin Pourhadi, Paul Swoboda• 2025

Related benchmarks

Task	Dataset	Result	Rank
Keypoint Matching	PASCALVOC with Berkeley keypoint annotations (test)	Hits@1 (Aero)75.8		61
Sparse Semantic Keypoint Matching	SPair-71k (test)	Aeroplane79.3		5

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord