Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

About

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model's ability to rank unknown knowledge. Finally, to improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models using a permutation distillation scheme. Our evaluation results turn out that a distilled 440M model outperforms a 3B supervised model on the BEIR benchmark. The code to reproduce our results is available at www.github.com/sunnweiwei/RankGPT.

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren• 2023

Related benchmarks

TaskDatasetResultRank
Document RankingTREC DL Track 2020 (test)
nDCG@100.6785
63
Information RetrievalBRIGHT
Biology nDCG@1033.8
45
Question AnsweringScientific QA Base setting
F1 Score51.95
38
RankingBEIR selected subset v1.0.0 (test)
TREC-COVID82.34
38
Information RetrievalScientific QA Base setting
HitRate@152
38
RerankingBEIR
NQ NDCG@50.4563
35
Information RetrievalBRIGHT 1.0 (test)
nDCG@10 (Avg)24.7
35
RerankingTREC
NDCG@5 (DL19)68.58
35
Abstract generationLongLaMP
R142.5
32
Passage RankingNQ
MRR45.05
29
Showing 10 of 55 rows

Other info

Follow for update