Self-supervised Product Quantization for Deep Unsupervised Image Retrieval
About
Supervised deep learning-based hash and vector quantization are enabling fast and large-scale image retrieval systems. By fully exploiting label annotations, they are achieving outstanding retrieval performances compared to the conventional methods. However, it is painstaking to assign labels precisely for a vast amount of training data, and also, the annotation process is error-prone. To tackle these issues, we propose the first deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner. We design a Cross Quantized Contrastive learning strategy that jointly learns codewords and deep visual descriptors by comparing individually transformed images (views). Our method analyzes the image contents to extract descriptive features, allowing us to understand image representations for accurate retrieval. By conducting extensive experiments on benchmarks, we demonstrate that the proposed method yields state-of-the-art results even without supervised pretraining.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Retrieval | Oxford Flowers | mAP47.31 | 99 | |
| Image Retrieval | NUS-WIDE | mAP83.9 | 57 | |
| Image-to-Image Retrieval | Food101 | mAP12.46 | 55 | |
| Fine-grained Image Hashing | CUB200 2011 (test) | Collision Probability7.50e-4 | 30 | |
| Fine-grained Image Hashing | Stanford Dogs | Collision Probability4.5 | 30 | |
| Fine-grained Image Hashing | Stanford Dogs (test) | Collision Probability0.047 | 30 | |
| Fine-grained Image Hashing | CUB200-2011 | Collision Probability0.074 | 30 | |
| Image Retrieval | Stanford Dogs | mAP52.13 | 25 | |
| Retrieval | StanfordCars | mAP5.08 | 25 | |
| Image Retrieval | CUB200-2011 | mAP17.09 | 25 |