Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search

About

Searching for approximate nearest neighbors (ANN) in the high-dimensional Euclidean space is a pivotal problem. Recently, with the help of fast SIMD-based implementations, Product Quantization (PQ) and its variants can often efficiently and accurately estimate the distances between the vectors and have achieved great success in the in-memory ANN search. Despite their empirical success, we note that these methods do not have a theoretical error bound and are observed to fail disastrously on some real-world datasets. Motivated by this, we propose a new randomized quantization method named RaBitQ, which quantizes $D$-dimensional vectors into $D$-bit strings. RaBitQ guarantees a sharp theoretical error bound and provides good empirical accuracy at the same time. In addition, we introduce efficient implementations of RaBitQ, supporting to estimate the distances with bitwise operations or SIMD-based operations. Extensive experiments on real-world datasets confirm that (1) our method outperforms PQ and its variants in terms of accuracy-efficiency trade-off by a clear margin and (2) its empirical performance is well-aligned with our theoretical analysis.

Jianyang Gao, Cheng Long• 2024

Related benchmarks

TaskDatasetResultRank
Long-context Language UnderstandingLongBench-e
Average Score43.52
93
Index ConstructionSIFT-1M
Construction Time (s)1.65
8
Inner Product EstimationHigh-dimensional vectors
Inner Product Distortion (b=1)0.571
5
Vector ReconstructionHigh-dimensional vectors
MSE Distortion (b=1)0.363
5
Index ConstructionCohere 10M
Construction Time1.55
2
Index ConstructionSIFT 100M
Index Construction Time (h)4.64
2
Index ConstructionSIFT1B
Index Construction Time (h)98.63
2
Showing 7 of 7 rows

Other info

Follow for update