CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
About
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Filip Radenovi\'c, Giorgos Tolias, Ond\v{r}ej Chum• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Retrieval | Holidays | mAP82.5 | 115 | |
| Image Retrieval | Oxford 5k | mAP85 | 100 | |
| Image Retrieval | Oxford5k (test) | mAP85 | 97 | |
| Image Retrieval | Paris6k (test) | mAP85 | 88 | |
| Multi-class classification | PACS (test) | Accuracy (Art Painting)71.5 | 76 | |
| Image Retrieval | Oxford105k (test) | mAP75.1 | 56 | |
| Image Retrieval | Oxford 105k | mAP81.8 | 47 | |
| Image Retrieval | Paris6k | mAP93.8 | 45 | |
| Image Retrieval | Paris 106k (Par106k) | mAP89.9 | 34 | |
| Image Retrieval | Paris106k (test) | mAP76.4 | 26 |
Showing 10 of 17 rows