GMTR: Graph Matching Transformers

About

Vision transformers (ViTs) have recently been used for visual matching beyond object detection and segmentation. However, the original grid dividing strategy of ViTs neglects the spatial information of the keypoints, limiting the sensitivity to local information. Therefore, we propose QueryTrans (Query Transformer), which adopts a cross-attention module and keypoints-based center crop strategy for better spatial information extraction. We further integrate the graph attention module and devise a transformer-based graph matching approach GMTR (Graph Matching TRansformers) whereby the combinatorial nature of GM is addressed by a graph transformer neural GM solver. On standard GM benchmarks, GMTR shows competitive performance against the SOTA frameworks. Specifically, on Pascal VOC, GMTR achieves $\mathbf{83.6\%}$ accuracy, $\mathbf{0.9\%}$ higher than the SOTA framework. On Spair-71k, GMTR shows great potential and outperforms most of the previous works. Meanwhile, on Pascal VOC, QueryTrans improves the accuracy of NGMv2 from $80.1\%$ to $\mathbf{83.3\%}$, and BBGM from $79.0\%$ to $\mathbf{84.5\%}$. On Spair-71k, QueryTrans improves NGMv2 from $80.6\%$ to $\mathbf{82.5\%}$, and BBGM from $82.1\%$ to $\mathbf{83.9\%}$. Source code will be made publicly available.

Jinpei Guo, Shaofeng Zhang, Runzhong Wang, Chang Liu, Junchi Yan• 2023

Related benchmarks

Task	Dataset	Result
Keypoint Matching	PASCALVOC with Berkeley keypoint annotations (test)	Hits@1 (Aero)69	61
Graph matching	SPair-71k (test)	Mean Accuracy83.2	46
Graph matching	PASCAL VOC with Berkeley annotations (test)	--	36
Sparse Semantic Keypoint Matching	SPair-71k (test)	Aeroplane75.6	5

Showing 4 of 4 rows

Other info

Code

Follow for update

@wizwand_team Discord