Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Corpus-Steered Query Expansion with Large Language Models

About

Recent studies demonstrate that query expansions generated by large language models (LLMs) can considerably enhance information retrieval systems by generating hypothetical documents that answer the queries as expansions. However, challenges arise from misalignments between the expansions and the retrieval corpus, resulting in issues like hallucinations and outdated information due to the limited intrinsic knowledge of LLMs. Inspired by Pseudo Relevance Feedback (PRF), we introduce Corpus-Steered Query Expansion (CSQE) to promote the incorporation of knowledge embedded within the corpus. CSQE utilizes the relevance assessing capability of LLMs to systematically identify pivotal sentences in the initially-retrieved documents. These corpus-originated texts are subsequently used to expand the query together with LLM-knowledge empowered expansions, improving the relevance prediction between the query and the target documents. Extensive experiments reveal that CSQE exhibits strong performance without necessitating any training, especially with queries for which LLMs lack knowledge.

Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, Andrew Yates• 2024

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR v1.0.0 (test)--
65
Medical Question AnsweringMMLU Med
Accuracy69.7
61
Information RetrievalTREC-COVID
NDCG@1074.2
44
Medical Question AnsweringBioASQ
Accuracy72.2
38
Medical Question AnsweringMedQA US
Accuracy49.6
18
Factoid-style retrievalTREC DL19
NDCG@1063.4
16
Information RetrievalSciFact--
15
Passage retrievalTREC DL 2019 (evaluation)
NDCG@100.6816
14
Passage retrievalTREC DL 2020 (evaluation)
NDCG@100.6539
14
Passage retrievalMS MARCO passage (dev)
NDCG@100.2906
14
Showing 10 of 10 rows

Other info

Follow for update