Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Billion-scale similarity search with GPUs

About

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

Jeff Johnson, Matthijs Douze, Herv\'e J\'egou• 2017

Related benchmarks

TaskDatasetResultRank
ClusteringWiki
Clustering Time (s)17.3
16
Image RetrievalAircraft (test)
Recall@171.5
11
Fine-grained retrievalPets (test)
R@10.915
6
Fine-grained retrievalCUB-200 (test)
R@10.734
6
Fine-grained retrievalCARS 196 (test)
R@189.7
6
Fine-grained retrievalFood-101 (test)
Recall@194.1
6
Fine-grained retrievalFlowers-102 (test)
Recall@194.1
6
Fine-grained retrievalDTD (test)
R@170.6
6
Multimodal Machine TranslationTopicVD (test)
BLEU28.93
5
ClusteringCohere
Clustering Time (s)261.9
4
Showing 10 of 25 rows

Other info

Follow for update