NaviX: A Native Vector Index Design for Graph DBMSs With Robust Predicate-Agnostic Search Performance
About
There is an increasing demand for extending existing DBMSs with vector indices so that they become unified systems capable of supporting modern predictive applications, which require joint querying of vector embeddings together with the structured properties and connections of objects. We present NaviX, a native vector index for graph DBMSs (GDBMSs) that has two main design goals. First, we aim to implement a disk-based vector index that leverages the core storage and query-processing capabilities of the underlying GDBMS. To this end, NaviX is built on the Hierarchical Navigable Small-World (HNSW) graph, which itself is a graph-based structure. Second, we aim to support predicate-agnostic filtered vector search queries, in which the k nearest neighbors (kNNs) of a query vector vQ are searched only within an arbitrary subset S of vectors defined by an ad-hoc selection sub-query QS. We adopt a prefiltering approach that evaluates QS first and passes the full description of subset S to the kNN search operator. We study how to design a prefiltering search algorithm that remains robust under varying selectivities and under different correlations between subset S and query vector vQ. We propose an adaptive algorithm that uses the local selectivity of each vector in the HNSW graph to choose an appropriate heuristic at every iteration of the kNN search. Finally, We demonstrate NaviX's robustness and efficiency through extensive experiments against both existing prefiltering- and postfiltering-based baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Filtered Nearest Neighbor Search | MSTuring 10M subset 1.0 (train) | Indexing Time (s)1.28e+3 | 12 | |
| Filtered Nearest Neighbor Search | ARXIV-2M-label 1.0 (train) | Indexing Time (s)2.38e+3 | 9 | |
| Filtered Nearest Neighbor Search | SIFT-1M-label 1.0 (train) | Indexing Time (s)54 | 9 | |
| Filtered Nearest Neighbor Search | YFCC-10M subset 1.0 (train) | Indexing Time (s)1.22e+3 | 8 | |
| Filtered Nearest Neighbor Search | LAION 1M 1.0 (train) | Indexing Time (s)145 | 8 | |
| Filtered Nearest Neighbor Search | LAION 5M subset 1.0 (train) | Indexing Time (s)872 | 8 | |
| Filtered Nearest Neighbor Search | LAION 25M 1.0 (train) | Indexing Time (s)4.98e+3 | 8 | |
| Filtered Nearest Neighbor Search | ARXIV-2M range 1.0 (train) | Indexing time (s)2.08e+3 | 6 | |
| Filtered Nearest Neighbor Search | MSTuring 10M-range 1.0 (train) | Indexing Time (s)1.28e+3 | 6 |