Exploring the Best Practices of Query Expansion with Large Language Models

About

Large Language Models (LLMs) are foundational in language technologies, particularly in information retrieval (IR). Previous studies have utilized LLMs for query expansion, achieving notable improvements in IR. In this paper, we thoroughly explore the best practice of leveraging LLMs for query expansion. To this end, we introduce a training-free, straightforward yet effective framework called Multi-Text Generation Integration (\textsc{MuGI}). It leverages LLMs to generate multiple pseudo-references, integrating them with queries to enhance both sparse and dense retrievers. Our empirical findings reveal that: (1) Increasing the number of samples from LLMs benefits IR systems; (2) A balance between the query and pseudo-documents, and an effective integration strategy, is critical for high performance; (3) Contextual information from LLMs is essential, even boost a 23M model to outperform a 7B baseline model; (4) Pseudo relevance feedback can further calibrate queries for improved performance; and (5) Query expansion is widely applicable and versatile, consistently enhancing models ranging from 23M to 7B parameters. Our code and all generated references are made available at \url{https://github.com/lezhang7/Retrieval_MuGI}

Le Zhang, Yihong Wu, Qian Yang, Jian-Yun Nie• 2024

Related benchmarks

Task	Dataset	Result
Information Retrieval	BEIR (test)	--	126
Information Retrieval	SciFact BEIR (test)	nDCG@1076.6	36
Information Retrieval	TREC DL 2020	nDCG@1063.9	33
Retrieval	Bridge (test)	Hit@1075	25
Web Search Retrieval	TREC DL 19	nDCG@1071.8	22
Web Search Retrieval	TREC DL 20	nDCG@1068.9	22
Information Retrieval	DBPedia BEIR (test)	nDCG@1044.5	21
Information Retrieval	TREC DL 2019	mAP@1k46.6	16
Information Retrieval	TREC DL Hard	mAP@1k22.4	16
Information Retrieval	Touche BEIR 2020 (test)	nDCG@1032.4	15

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord