Billion-scale similarity search with GPUs
About
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Clustering | Wiki | Clustering Time (s)17.3 | 16 | |
| Image Retrieval | Aircraft (test) | Recall@171.5 | 11 | |
| Fine-grained retrieval | Pets (test) | R@10.915 | 6 | |
| Fine-grained retrieval | CUB-200 (test) | R@10.734 | 6 | |
| Fine-grained retrieval | CARS 196 (test) | R@189.7 | 6 | |
| Fine-grained retrieval | Food-101 (test) | Recall@194.1 | 6 | |
| Fine-grained retrieval | Flowers-102 (test) | Recall@194.1 | 6 | |
| Fine-grained retrieval | DTD (test) | R@170.6 | 6 | |
| Multimodal Machine Translation | TopicVD (test) | BLEU28.93 | 5 | |
| Clustering | Cohere | Clustering Time (s)261.9 | 4 |