MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search
About
Graph-based Approximate Nearest Neighbor (ANN) search often suffers from performance degradation in high-dimensional spaces due to the Euclidean-Geodesic mismatch, where greedy routing diverges from the underlying data manifold. To address this challenge, this paper presents Manifold-Consistent Graph Indexing (MCGI), a geometry-aware and disk-resident indexing method that leverages Local Intrinsic Dimensionality (LID) to dynamically adapt search strategies to the intrinsic geometry of data. Unlike conventional algorithms that treat dimensions uniformly, MCGI modulates its beam search budget based on in-situ geometric analysis, which reduces sensitivity to data-specific hyperparameters by replacing a single scalar with a geometry-informed range that remains stable across datasets of varying dimensionality. Theoretical analysis demonstrates that MCGI provides robust approximation by preserving manifold-consistent topological connectivity. Extensive evaluations against five industry-standard baselines across five datasets up to billion scales confirm the advantages of the proposed approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Approximate Nearest Neighbor Search | GIST1M | Peak QPS (R@10 >= 95%)375.1 | 3 |