Fine-Tuning LLaMA for Multi-Stage Text Retrieval

About

The effectiveness of multi-stage text retrieval has been solidly demonstrated since before the era of pre-trained language models. However, most existing studies utilize models that predate recent advances in large language models (LLMs). This study seeks to explore potential improvements that state-of-the-art LLMs can bring. We conduct a comprehensive study, fine-tuning the latest LLaMA model both as a dense retriever (RepLLaMA) and as a pointwise reranker (RankLLaMA) for both passage retrieval and document retrieval using the MS MARCO datasets. Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models. Additionally, since LLMs can inherently handle longer contexts, they can represent entire documents holistically, obviating the need for traditional segmenting and pooling strategies. Furthermore, evaluations on BEIR demonstrate that our RepLLaMA-RankLLaMA pipeline exhibits strong zero-shot effectiveness. Model checkpoints from this study are available on HuggingFace.

Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, Jimmy Lin• 2023

Related benchmarks

Task	Dataset	Result
Multi-hop Question Answering	2WikiMultihopQA	EM32.24	559
Multi-hop Question Answering	HotpotQA	F1 Score32.95	294
Question Answering	2Wiki	EM16	241
Information Retrieval	BEIR	SciFact0.756	120
Retrieval	TREC DL 2019	--	83
Multi-hop Question Answering	Multi-hop RAG	--	77
Information Retrieval	BEIR v1.0.0 (test)	ArguAna56	75
Information Retrieval	COVID	--	50
Information Retrieval	FIQA BEIR (test)	nDCG@1048.1	44
Conversational Retrieval	QReCC (test)	Recall@1020.4	43

Showing 10 of 71 rows

...

Other info

Follow for update

@wizwand_team Discord