Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

About

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model's ability to rank unknown knowledge. Finally, to improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models using a permutation distillation scheme. Our evaluation results turn out that a distilled 440M model outperforms a 3B supervised model on the BEIR benchmark. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren• 2023

Related benchmarks

TaskDatasetResultRank
Question AnsweringScientific QA Base setting
F1 Score51.95
38
RankingBEIR selected subset v1.0.0 (test)
TREC-COVID82.34
38
Information RetrievalScientific QA Base setting
HitRate@152
38
RerankingBEIR
NQ NDCG@50.4563
35
RerankingTREC
NDCG@5 (DL19)68.58
35
Abstract generationLongLaMP
R142.5
32
Passage RankingNQ
MRR45.05
29
RecommendationGoodreads (test)
HR@557.63
29
Passage RankingTREC DL 2019
R@1090
28
Passage retrievalNatural Questions (NQ)
Top-10 Accuracy58.33
28
Showing 10 of 41 rows

Other info

Follow for update