Progressive Searching for Retrieval in RAG

About

Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores documents as embedding vectors in a database. Given a query, search is executed to find the most related documents. Then, the topmost matching documents are inserted into LLMs' prompt to generate a response. Efficient and accurate searching is critical for RAG to get relevant information. We propose a cost-effective searching algorithm for retrieval process. Our progressive searching algorithm incrementally refines the candidate set through a hierarchy of searches, starting from low-dimensional embeddings and progressing into a higher, target-dimensionality. This multi-stage approach reduces retrieval time while preserving the desired accuracy. Our findings demonstrate that progressive search in RAG systems achieves a balance between dimensionality, speed, and accuracy, enabling scalable and high-performance retrieval even for large databases.

Taehee Jeong, Xingzhe Zhao, Peizu Li, Markus Valvur, Weihua Zhao• 2026

Related benchmarks

Task	Dataset	Result	Rank
Nearest Neighbor Retrieval	1 million documents vector database gte-Qwen2-7B-instruct embeddings 1.0	Top-1 Accuracy95.02		10
Text Retrieval	text-embedding-3-large embeddings (1M documents)	Accuracy94.45		10

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord