Search-Adaptor: Embedding Customization for Information Retrieval

About

Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data can further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the embeddings generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via prediction APIs. On multiple English, multilingual, and multimodal retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5% improvements for Google Embedding APIs in nDCG@10 averaged over 14 BEIR datasets.

Jinsung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister• 2023

Related benchmarks

Task	Dataset	Result
Information Retrieval	BEIR	SciFact0.9885	174
Information Retrieval	BEIR (test)	--	130
Information Retrieval	NFCorpus (test)	NDCG@100.442	69
Information Retrieval	SciFact (test)	NDCG@100.883	65
Information Retrieval	TREC-COVID	NDCG@1084	59
Information Retrieval	MS-MARCO (test)	NDCG@100.698	56
Information Retrieval	MS Marco	NDCG@1084	56
Information Retrieval	FIQA BEIR (test)	nDCG@1045.69	44
Information Retrieval	Natural Questions	--	40
Information Retrieval	SciFact BEIR	NDCG@1098.59	36

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord