Learning Token-based Representation for Image Retrieval

About

In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features with a large codebook and match images with aggregated match kernel. However, the complexity of these approaches is non-trivial with large memory footprint, which limits their capability to jointly perform feature learning and aggregation. To generate compact global representations while maintaining regional matching capability, we propose a unified framework to jointly learn local feature representation and aggregation. In our framework, we first extract deep local features using CNNs. Then, we design a tokenizer module to aggregate them into a few visual tokens, each corresponding to a specific visual pattern. This helps to remove background noise, and capture more discriminative regions in the image. Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention. Finally, different visual tokens are concatenated to generate a compact global representation. The whole framework is trained end-to-end with image-level labels. Extensive experiments are conducted to evaluate our approach, which outperforms the state-of-the-art methods on the Revisited Oxford and Paris datasets.

Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li• 2021

Related benchmarks

Task	Dataset	Result
Image Retrieval	Revisited Oxford (ROxf) (Medium)	mAP60.8	124
Image Retrieval	Revisited Paris (RPar) (Hard)	mAP54.8	115
Image Retrieval	Oxford 5k	mAP81.2	100
Image Retrieval	Revisited Oxford (ROxf) (Hard)	mAP37.3	81
Image Retrieval	Paris Revisited (Medium)	mAP75.8	63
Image Retrieval	Paris6k	mAP89.6	45
Image Retrieval	RPar+R1M Medium	mAP44.1	31
Image Retrieval	RPar+R1M Hard	mAP19.7	31
Image Retrieval	ROxf + R1M	Retrieval Latency (s)0.1042	10
Image Retrieval	RPar + R1M	Memory (GB)0.1	10

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord