Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Progressive Searching for Retrieval in RAG

About

Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores documents as embedding vectors in a database. Given a query, search is executed to find the most related documents. Then, the topmost matching documents are inserted into LLMs' prompt to generate a response. Efficient and accurate searching is critical for RAG to get relevant information. We propose a cost-effective searching algorithm for retrieval process. Our progressive searching algorithm incrementally refines the candidate set through a hierarchy of searches, starting from low-dimensional embeddings and progressing into a higher, target-dimensionality. This multi-stage approach reduces retrieval time while preserving the desired accuracy. Our findings demonstrate that progressive search in RAG systems achieves a balance between dimensionality, speed, and accuracy, enabling scalable and high-performance retrieval even for large databases.

Taehee Jeong, Xingzhe Zhao, Peizu Li, Markus Valvur, Weihua Zhao• 2026

Related benchmarks

TaskDatasetResultRank
Nearest Neighbor Retrieval1 million documents vector database gte-Qwen2-7B-instruct embeddings 1.0
Top-1 Accuracy95.02
10
Text Retrievaltext-embedding-3-large embeddings (1M documents)
Accuracy94.45
10
Showing 2 of 2 rows

Other info

Follow for update